RAN I grr enemies re 


Sa oe ee 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 














Volume XXXIV March, 1943 Number 3 








AN ANNUAL TEN-DAY GUIDANCE PROGRAM— 
METHODS AND RESULTS 


VERNON JONES 
Clark University 


In 1934 a rather unique guidance program was organized at 
Worcester Polytechnic Institute' which provided an opportunity 
for juniors and seniors from secondary schools to live at an engi- 
neering college for a ten-day period and study their aptitudes, 
achievements, and interests against the background of require- 
ments and activities in a variety of fields. The program has 
functioned annually since that time. The purpose of this 
article is to present a brief description of this plan, which has not 
hitherto been described in the literature, and to give the results 
of a nine-year study of its operation. The program was 
organized to assist juniors and seniors in secondary schools to 
clarify their vocational choices and especially to help them to 
plan their immediate educational future. To date, three hundred 
fifteen boys have participated in the program. 

The keynote of the plan has been self-guidance in the sense that 
each boy has been given a wide variety of experiences and tests 
during his sojourn at the college, and he has been helped to judge 





1 Dean Jerome Howe was the originator of the plan and has served as 
Chairman of the program from its inception. Professor Paul R. Swan is 
the Executive Director. The writer has been responsible for the testing, 
the guidance interviews, and the present follow-up study. In this capacity 
the writer is indebted to members of the Worcester Polytechnic Institute 
faculty who have served in the guidance program, to Mr. Richard Marden 
who has assisted with the tests and has made a follow-up study, and to Mr. 
William Herrmann and Miss Phyllis Bieberbach who have made additional 
follow-up investigations. None of these earlier studies has been published, 
but the latter two were Master’s thesis at Clark University and are on file 
at the University Library. 


129 








130 The Journal of Educational Psychology 


his own interests and aptitudes in light of increased knowledge 
about himself. 

Since the program was conducted at an engineering college 
and in a city of varied industries, it is probable that that part 
of the program involving firsthand study of actual work in various 
jobs was best as it related to different fields of engineering. 
However, it should be emphasized that the program has been 
genuinely a guidance program and not one of selecting boys for 
an engineering college or career. Efforts have been made to 
assist the boys to understand and appreciate the values and the 
method of approach of the liberal arts college by requirng 
attendance at one or more lectures by a representative of a liberal 
arts college and by conferences with individual boys. Similarly, 
attempts have been made to have the boys understand and 
appreciate the work of advanced schools of applied mechanics in 
which there was less stress upon mathematics and general 
scientific background than in the strict engineering colleges. 

Among the problems raised by the boys there have been many 
besides these involving a choice among liberal arts college, an 
engineering college, or an institute of applied mechanics. Some 
of the boys, for example, were interested in knowing their 
chances of success in such professional schools as those of medi- 
cine or law. Some were faced by conflicting ambitions for them 
within their family. Some were seeking advice as to whether 
or not an extra year in a preparatory school between high school 
and college would be advisable for them. Some had confused a 
certain facility in manual and mechanical operations with apti- 
tude for engineering and were seeking to know what educational 
and vocational plans they should make when they were ‘inter- 
ested in engineering’ but had only mediocre marks in mathe- 
matics and science. These and many other problems have been 
raised by the boys and their parents. 


EACH BOY LEARNS ABOUT HIMSELF AND ABOUT JOBS 


In this guidance program special efforts have been made to 
provide varied opportunities for each boy to learn about him- 
self. In the first place, in the free association of the boys with 
each other in the dormitory, on trips, in the laboratories, and in 
the discussions after lectures the boys were given a chance 
roughly to evaluate their achievement and interest along specific 








An Annual Ten-day Guidance Program 131 


lines in comparison with those of others who were thinking of 
going on to college. They had a chance to rate themselves 
against the background of what the college instructors assume in 
the way of knowledge and interests in their lectures and informal 
discussions. 

They learned about the everyday activities in different jobs 
and about the nature of the education required for those types 
of jobs. Since practically every boy who attended the guidance 
program was considering engineering as one possible field of 
study, the rather intensive exposure of the group to the various 
branches of engineering served very definitely to help each boy to 
clarify his thinking about, or interest in, engineering. Every 
attempt was made, of course, to present various occupations and 
the education for those occupations in a true light, divorced from 
any artificial coloring that might result from overemphasis on 
the exciting and dramatic in the field. This was supplemented 
by lectures, by discussions with individual boys, and by suggested 
readings on occupations in which they showed interest. 


FACTS GATHERED ABOUT EACH BOY 


In addition to the above formal provisions for the boys to 
learn about themselves, careful studies were being made of the 
aptitudes, achievements, interests, and personality and character 
traits of each individual. By means of standardized group tests 
all boys were tested in the following areas: mathematical apti- 
tude and achievement, English achievement, general scholastic 
aptitude, aptitude for spatial visualization, and chemistry ‘apti- 
tude.’ For the measurement of vocational interests, the Strong 
Vocational Interest Blank was used and scored for nine different 
occupations. For personality and character measurements, 
composite ratings were employed, based on the results of from 
five to seven independent ratings on a specially prepared rating 
scale. These ratings were made by instructors, some of whom 
observed the boys as they worked on practical tasks (or ‘work 
samples’) in the laboratories, and some of whom observed the 
boys in the dormitory, on trips, and at lectures and discussions. 
Ratings were made of each boy on each of the following: care- 
fulness, alertness and attention, reliability, initiative, rapidity of 
work, persistence, and general promise in engineering. Probably 
the most unique measuring device employed was the ‘work 

















132 The Journal of Educational Psychology 


samples.’ In each of the four major branches of engineering and 
in physics and chemistry a separate work sample was provided. 
In each of these the boy was required to work with concrete 
materials in the laboratory or in the field and perform some 
simple operations typical of that branch of study. Each boy 
was carefully observed and rated not only on his success in the 
work but on his ways of attacking the problem and the genuine- 
ness of his interest in it. 

In addition to these test results and ratings, there was avail- 
able to the staff a brief evaluation of each boy by his secondary- 
school principal or guidance teacher. These evaluations were 
usually accompanied by a complete record of the boy’s high- 
school marks; but since the program began at the time that many 
of the schools were closing for the summer vacation it was not 
always convenient for the schools to get the complete transcript 
ready for us. This failure to have high-school marks for all 
boys, plus the fact that the standards in marking in the different 
schools from which our boys came were unequal—making it 
impossible to judge adequately their study habits—were among 
the main weaknesses in our program on the fact-finding side. 


THE TEST BATTERY 


The test battery consisted of the Iowa Mathematics Aptitude 
Test, the Carnegie Mental Ability Test, the Yale Spatial Visuali- 
zation Test, and the Iowa Chemistry Aptitude Test.' All test 
results were converted into comparable percentile scores, using 
a combination of the percentiles on the Carnegie and Iowa 
Mathematics tests as a base. This made possible a ready com- 
parison of a student’s standing in different areas and also 
facilitated a combining of all tests when that was desired. The 
Carnegie Test was our longest test and proved to be, all things 
considered, the most valuable single one in the battery. It was, 
therefore, more heavily weighted than any other in our battery. 
It yielded separate scores on verbal and numerical abilities 





1 The tests used in 1941 and 1942 were in essentially the same areas, but 
newer tests from the Coéperative Test series have been substituted for the 
English and chemistry tests, and a reading test and a second scholastic 
aptitude test have been added. The test battery was kept constant for the 


first six years of the guidance program. 





An Annual Ten-day Guidance Program 133 


(each based on three subtests) and a total scholastic aptitude 
score based on these two sections, plus two other subtests. 

For guidance purposes the results from the separate tests 
ungrouped, or grouped only for such areas as mathematical or 
verbal, were especially valuable, but at times it was helpful to 
have an all-round measure based on all the tests. Lacking at the 
beginning any regression equations based on the correlations of 
the tests with future success of the boys from which to determine 
exact weights for the tests in a total score,! we used the median 
of the following seven measures as a composite test score: lowa 
Mathematics Aptitude, Carnegie Numerical, Yale Spatial, 
Iowa Chemistry, Carnegie Verbal, Iowa English, and Carnegie 


Total. 
CONFERENCES WITH THE BOYS AND THEIR PARENTS 


During the last day of the guidance program the writer had 
an individual conference with each boy. This was in addition to 
informal chats which various members of the staff had with the 
boys from time to time. These conferences were twenty minutes 
in length. In a sense, these were the climax of the ten-day 
program. They were designed to bring together all the test 
results and the ratings of each boy, and to share these results 
with him in discussing his problems of educational and vocational 
planning. Efforts were made throughout the program to have 
each boy come to the conference with definite questions in mind, 
and stress was placed on the desire of the counselor to help him to 
solve his problems rather than to give guidance in any dictatorial 
or arbitrary sense. 

By way of preparation for the conferences a profile chart was 
plotted for each boy, showing his relative and absolute standing 
on the different tests and featuring particularly his relative 
strength in the verbal field and in the mathematical and scientific 
fields. Included on this graph were also the results from the 





‘In a forthcoming article partial correlations, regression equations, and 
multiple correlations are given for essentially similar tests which were used 
later with all entering freshmen at Worcester Polytechnic Institute. These 
results, as well as follow-up ones to be given later in this article, indicate 
that for the guidance purposes for which this composite score was used it 
was very satisfactory, at least for boys interested in studying engineering 
in an institution with standards similar to those at Worcester Polytechnic 
Institute. 

















134 The Journal of Educational Psychology 


Vocational Interest Blank and from the work samples. In 
addition, a study was made of the marks obtained in secondary 
school in their relation to the aptitude test results. Finally, a 
conference was held among six or eight members of the staff 
who had been in the best position to observe the abilities, inter- 
ests, and personality qualities of the boys during the guidance 
program. The purpose of the conference was to talk over the 
assets, liabilities, problems, and planning of each boy. This con- 
ference was always held the day before the individual conferences 
with the boys began. All results from tests and ratings were 
available as a factual basis for the discussion of each boy’s 
problems and plans. The writer participated in this conference, 
attempting to get all possible information from the various mem- 
bers of the group which might help in counseling with each boy. 

In the individual conferences the boys were always eager to 
know their results on the tests and usually had several questions 
concerning their educational and vocational planning. The test 
results were given, not in specific numerical terms but rather in 
general terms as depicted on the individual profile chart. Empha- 
sis was placed not only on the standardized test results, but also 
on the results on the interest questionnaire and on secondary- 
school marks. These were interpreted for the boy in such a way 
as to indicate to him his relative strengths and weaknesses, and 
his absolute standing in comparison with others with whom he 
would have to compete if he entered a liberal arts college or an 
engineering school. By preliminary research a critical minimum 
score was arrived at for use in guiding those boys who were inter- 
ested in entering Worcester Polytechnic Institute. A similar 
critical minimum score, although somewhat less clear cut, was 
arrived at from the literature on prediction in liberal arts colleges 
and applied in a tentative manner. 

In addition to the discussion of aptitudes, interests, and school 
achievement as they applied to the boys’ questions, some atten- 
tion was given to the boys’ occupational information. Readings 
were frequently suggested and made readily available. 

At the end of the guidance program one or both parents of 
approximately fifty per cent of the boys had an interview with 
the writer. It was the policy to have the boy present at these 
conferences, and he was encouraged to tell his parents what he 
had gotten out of his private conference, and this was used as a 
point of departure for family discussion and planning with the 





An Annual Ten-day Guidance Program 135 


counselor. The test results, interest ratings, etc., were presented 
to the parents in the same way as they had been given and 
interpreted to the boy. To those parents who did not have an 
interview with the counselor, a letter was written in which the 
main points in the conference with the boy were reviewed. 

It is interesting to note in passing that a rather large number 
of the problems among the boys stemmed from personality and 
inter-family considerations. The failure of a boy to achieve up 
to his capacity or up to what was expected of him by his parents, 
and the conflict within the family group concerning vocational 
ambitions for the boy, were prominent among these problems. 
To get such issues out into the open and to talk them over with 
the counsellor appeared to be valuable and much appreciated 
by both the boys and the parents. 


RESULTS 


In an attempt to evaluate the program, two types of inves- 
tigation have been conducted. First, a follow-up study has been 
made of all boys who entered Worcester Polytechnic Institute to 
determine the accuracy with which the tests and ratings pre- 
dicted success in the college. Secondly, the attitudes of the boys 
and of their parents toward the guidance program were inves- 
tigated by a questionnaire several years after the boys’ atten- 
dance and after they had had an opportunity to test out the 
guidance in actual planning. 

The Academic Follow-up of the Boys Who Entered Worcester 
Polytechnic Institute—Probably the most significant evaluation 
of the program which we have been able to make has been a 
follow-up study of the boys who entered Worcester Polytechnic 
Institute in 1938 or previously and who have had time to gradu- 
ate. Of the three hundred fifteen boys who have attended the 
guidance program to date, forty-three satisfied the conditions for 
this part of the study. Of this number, twenty-nine were above 
our critical minimum score! and were given encouragement in 
their plans to enter an engineering college roughly in proportion 
to the degree to which their scores exceeded the critical minimum 
score. Fourteen were below the critical minimum score and 





1 The critical minimum score adopted after some preliminary research was 
the median percentile of 65 on the seven measures mentioned above. This 
65th percentile of high-school seniors as given in the norms for the Carnegie 
Mental Ability Test and the Iowa Mathematics Aptitude Test. 








136 The Journal of Educational Psychology 


were warned of the difficulties which they were likely to face in 
an engineering college. However, the guidance was not given 
in any mandatory or dictatorial manner, and there was no 
attempt to link the admission standards of the college to this 
critical minimum score. From the point of view of testing our 
critical minimum score, it is fortunate that some of the boys were 
willing to take the risk of entering in the face of some discour- 
aging counsel. 

What percentage of each of those groups graduated? In 
answering this question we have, perhaps, been over-conservative 
in our classifications and have counted among those not gradu- 
ating not only those who failed but also those who withdrew or 
transferred to other colleges, and we have counted among those 
graduating those who finished in the normal four years and also 
those who graduated one year late. The upshot of this part of 
the study can be presented very briefly by a2 X 2 table and a 
tetrachoric r based upon it. 


TaBLE I.—THE RELATION BETWEEN BEING ABOVE OR BELOW 
THE CrITICAL MINIMUM SCORE AND GRADUATION 
FROM WORCESTER POLYTECHNIC INSTITUTE 

Not GRADUATED GRADUATED TOTAL 


Above CMS........... s 21 29 
Below CMS........... 9 5 14 
ee 17 26 43 


It will be seen that, of the twenty-nine boys who were above 
the critical minimum score and were encouraged in their plans 
to enter an engineering college, twenty-one, or 72.4 per cent, 
graduated, whereas of the fourteen who were below the critical 
minimum score only 35.7 per cent graduated. The tetrachoric r 
based on Table I is .46. The r (Pearson product-moment) 
between the composite scores on the tests and the average marks 
at Worcester Polytechnic Institute (for the entire time that each 
man was there) was found to be .48.!. These results are based 





1 The r between the composite scores on the tests and average marks in 
the freshman year was found to be .54. This was based on eighty-three 
cases, the total number who had spent as much as a year at Worcester 
Polytechnic Institute when the study was completed. The comparable r 
based on the cases which were studied over four years was .52. 





SOR 2B Cal ate 


An Annual Ten-day Guidance Program 137 


only on the test results and are very conservative. Of the eight 
boys who were above the CMS but did not graduate, two were 
making very good records when they withdrew. Two others 
were rated quite low by the staff on mental, personality, and 
character qualities needed for success in engineering, and in 
addition one had only a ‘C’ rating on interest in engineering 
on the Strong Blank. Perhaps it would be fairer, therefore, to 
charge up against our predictions only six misses out of twenty- 
seven, instead of eight out of twenty-nine, in which case the 
percentage of those above CMS who graduated would become 
.78.' Moreover, of the five students who were below CMS and 
yet graduated, two required five years to do it, leaving only 


twenty-one per cent of those below CMS who graduated in the 


normal time. 
From the point of view of predicting success at Worcester 


Polytechnic Institute alone, which, of course, was not our whole 
problem by any means, our statistical predictions would have 
been somewhat improved if the ratings had been included in the 
critical minimum score instead of being used as supplementary 
information for the guidance interview only. The r between the 
ratings by members of the staff and subsequent marks at Wor- 
cester Polytechnic Institute was .32. Moreover, it was found 
that for predicting success at Worcester Polytechnic Institute a 
part of our battery, properly weighted, was somewhat better 
than the entire battery as it was used. The multiple R between 
first-year marks and the weighted sum of the mathematics part 
of the Carnegie Test, the Iowa Mathematics, and the Yale 
Spatial Visualization was .60. This was the highest correlation 
we were able to get from our test data, and should be compared 
with an r of .52 between our total battery and first-year marks. 
The regression equation, showing the weights, is as follows: 


X; = .23X. + .O1LX; + .13X, 4+ 4.71 


where X, equals marks at Worcester Polytechnic Institute; X¢, 
numerical of Carnegie Test; X3, lowa mathematics; and X,, Yale 
spatial. The correlations of the zero order involving these four 


variables were as follows: 





' This should be compared with 58.4, which is the percentage of the entire 
entering classes at Worcester Polytechnic Institute eventually graduating 
during these same years. 








138 The Journal of Educational Psychology 


Ti2 = .53 T21 = .57 
T13 = .36 Tog = .32 
Tu4 >= 43 T34 .39 


Following up the idea that there might be important per- 
sonality and character differences between the five boys who 
were below the CMS, and yet who eventually graduated, and the 
six who were above the CMS, but who did not graduate, we 
obtained ‘ratings on each ‘of these eleven students from one or 
more instructors at Worcester Polytechnic Institute. Each 
instructor rated only those whom he remembered well in his 
classes. The instructors did not know the purpose of the 
ratings. The average ratings of the graduating and non- 
graduating students, respectively, on the different traits and 
characteristics are given in Table II. These ratings were, of 
course, independent of those made at the time of the guidance 
program and are based on lengthy observation of the student in 
college classes. They indicate clearly that the five men who 
graduated had certain qualities (not measured by our tests) 
which the six nongraduates did not possess, at least not to the 
same degree. Until some method is perfected whereby such 


TABLE II.—PERSONALITY AND CHARACTER RATINGS OF THE 
FrvE GRADUATING AND Six NON-GRADUATING STUDENTS 
FOR WHOM THE PREDICTIONS WERE INACCURATE 


Non- 
GRADUATING GRADUATING 
Effort, industry, and persistence. . 5.7 2.4! 
Reliability, dependability, and 
promptness with assignments. . . 5.8 2.1 
Attention, interest, alertness...... 5.9 3.1 
swap dais ve swak ee 0 5.5 3.4 
ia ais Se aio a d'n semaine 5.5 3.1 
Rapidity and dispatch in getting 
cia oe dew chw oboe « 5.1 2.8 
tins bid wikia nied anu es 5.6 2.8 


1 The ratings were made on a graphic rating scale; a rating of 9 was the 
highest possible and a rating of 1 the lowest. There were fewer ratings 
for the students who withdrew than for those who graduated, and conse- 
quently the former are less reliable. There was high agreement among the 
instructors in the rating of each student. 





An Annual Ten-day Guidance Program 139 


qualities can be measured more accurately previous to students’ 
entrance to college, the usual predictions of college success will 
remain probably around twenty per cent lower than they other- 
wise would be. 

A Follow-up Study of the Attitudes of the Boys and Their Parents 
Toward the Guidance Program.—In an attempt to measure the 
success of the program from the point of view of student and 
parental attitude, a questionnaire study was made of a rather 
large sampling of the boys who had attended the guidance 
program during the preceding five years. Two questionnaires, 
one for the boy and one for the parents, were sent to one hundred 
seventy-six homes. Of the parents, one hundred thirteen, or 
sixty-four per cent, returned the questionnaire completely filled 
out; of the boys, seventy-four, or forty-two per cent, filled it out. 
It is not known that those returning the questionnaires were a 
random sampling of the group so far as attitudes are concerned. 
It is quite possible that those replying contained a higher-than- 
normal percentage who were favorable toward the program, 
although we have no direct evidence either for or against this 
possibility. However, it seems doubtful if this factor has appre- 
ciably affected the results on more than one or two of the 
questions. ' 

The main results from this part of the study are as follows: 

(1) In reply to the question as to whether the guidance pro- 
gram (a) served to confirm original vocational choice, (b) changed 
original choice, (c) helped to formulate a choice in the first place, 
or (d) had no effect, sixty-five per cent of the boys marked (a), 
eleven per cent (b), fourteen per cent (c), three per cent (d), and 
seven per cent gave miscellaneous answers. In reply to a similar 
question about the choice of an educational institution, the votes 
were forty-seven per cent, fourteen per cent, twenty-one per 
cent, eleven per cent, and seven per cent, respectively. 

(2) With regard to the relative values of the different features 
of the guidance program, the boys’ ratings were as follows: the 





1 Results similar to ours were found in a study by Marden in 1937 in which 
he interviewed all the boys who had entered Worcester Polytechnic Institute 
up to that time and sent questionnaires to all others. He obtained returns 
from seventy-six per cent of the cases. In all places where his questions 
correspond to ours there was very close agreement in the two sets of results. 
Marden, R. D.: An Evaluation of a Pre-College Program of Guidance. Unpub- 
lished manuscript filed at Worcester Polytechnic Institute. 

















140 The Journal of Educational Psychology 


counselor’s interview with the student was rated as the most 
important; the trips were rated second; the lectures, third; the 
tests, fourth; informal conversations with members of the staff, 
fifth; the counselor’s interview with parents, sixth. In the 
parents’ ratings first place was given to the lectures and third 
place to the counselor’s interviews with the boys. Except for 
this difference their ranking of the features was the same as 
that of the boys. 

(3) In reply to the question: should the interview between the 
boy and the guidance counselor be longer? Seventy per cent of 
the boys and seventy-two per cent of the parents said ‘‘yes”’; 
thirty per cent of the boys and twenty-eight per cent of the 
parents said “no.” 

(4) Ninety-three per cent of the parents were satisfied with 
the interviews which they had with the counselor or with the 
letter received from him summarizing the results of the tests 
and of the conference with the boy. Our sampling might have 
decidedly influenced this result. Probably its chief value lies 
in the comparison which it provides with the results in paragraph 
(3) above. Of those parents who to the extent of ninety-three 
per cent said that their own interviews or letters were satis- 
factory, seventy-two per cent felt that a longer interview between 
their sons and the counselor would have been wise. 

(5) In response to a question as to ways in which the guidance 
program might be improved, the suggestions receiving the highest 
votes from the boys were these: (a) that the interviews should be 
longer (seventy per cent) ; (b) that there should be more individual 
work involving handling tools and doing experiments (twenty- 
one per cent); (c) that the period of the program should be 
lengthened by two or three days (twelve per cent); and (d) that 

there should be more time for informal discussion and conversa- 
tion with instructors in Worcester Polytechnic Institute (twelve 
per cent). 


SUMMARY 


The purpose of this paper has been to describe and evaluate a 
ten-day guidance program for high-school juniors and seniors 
which has been conducted annually for nine years at Worcester 
Polytechnic Institute. 





An Annual Ten-day Guidance Program 141 


The main problems of the boys as revealed in the interviews 
were as follows: (1) their chances of success in college; (2) the 
relative chances of success in a liberal arts college and in an 
engineering college; (3) the problem of vocational choice; (4) the 
problem of educational planning and of choice of a college or 
applied school; (5) the problem as to whether a fifth year of 
preparation between high school and college was advisable; 
(6) the problem of the conflict between the boy’s ambition and 
that of his parent or parents for him; (7) the confusion of ability 
in mechanical manipulation with engineering aptitude and 
interest. 

The evaluation of the program by the boys and their parents, 
after they had had time to act upon the guidance received, was 
indicated by such results as these: (1) Over ninety per cent of the 
boys said that if they were back in high school they would 
attend the program again. (2) In rating the relative values of 
the different features of the program, the boys’ ratings were as 
follows: first, the interview with the counselor; second, the trips; 
third, the lectures; fourth, the tests. (3) In judging the main 
effects which the program had on their educational planning, 
forty-seven per cent of the boys felt that it confirmed original 
vocational choices, thirty-five per cent thought that it helped 
them to formulate or to change their choices, seven per cent 
mentioned miscellaneous advantages derived, and eleven per 
cent said that it had no effect. 

The accuracy of the measurements upon which the guidance 
interviews were based was judged by a study of the degree of cor- 
respondence between our test results and success of the boys 
in Worcester Polytechnic Institute, the only institution to which 
a sufficient number of the boys have gone to date to justify any 
statistical study. Of those students standing above our critical 
minimum score, and therefore advised that they would probably 
succeed in college, seventy-two per cent graduated in normal 
time, and an additional six per cent were making good marks and 
would probably have graduated had they not withdrawn. Of 
those below the critical minimum score only twenty-one per cent 
graduate in normal time. The correlation between our battery 
of aptitude tests and marks over four years in the college was 
found to be .48. The multiple correlation between the weighted 
sum of the test results and first-year marks was .60. 


Pe eS 








.. THE VOCABULARY SECTIONS OF THE 
COOPERATIVE ENGLISH TESTS AT THE HIGHER 
LEVELS OF DIFFICULTY 


R. G. SIMPSON 
Carnegie Institute of Technology 


This paper deals with an analysis of the vocabulary sections of 
the Codperative English Tests at the higher levels of difficulty. It 
is an attempt to determine why the students who entered the 
freshman class of the engineering college last year did not do any 
too well on these particular sections of the tests, especially on 
the active vocabulary section. A brief discussion of the vocab- 
ulary sections of these tests will be given in order to familiarize 
the reader with their general nature. 


VOCABULARY SECTIONS AT THE HIGHER LEVELS OF DIFFICULTY 


One of these tests is called Test C-2: Reading Comprehension; 
the other is called Test B-2: Effectiveness of Expression. The 
vocabulary section of the former test is a recognition vocabulary 
which is intended to measure the student’s ability to recognize 
word meanings, whereas the vocabulary section of the latter test 
is an active recall vocabulary and is intended to measure the 
student’s ability to recall words whose meanings are described in 
sentences. The directions for doing the vocabulary sections of 
these tests are quoted verbatim in the succeeding paragraphs. 

The directions for administering the vocabulary section of Test 
C-2: Reading Comprehension, are as follows: 

“In each group below, select the numbered word or phrase 
which most nearly corresponds in meaning to the word at the head 
of that group, and put its number in the parentheses at the right. 
It is quite likely that you will finish this part before the time is 
up. In that case, go on immediately to Part II.” 


The time limit for this part of the test is fifteen minutes. 

The directions for administering the vocabulary section of Test 
B-2: Effectiveness of Expression, are as follows: 

“Each sentence below describes a certain word. The number 
in parentheses shows how many letters there are in the word. 
You are to think of the exact word which best fits the sentence, 


and find its First Lerrer among the choices given below the 
142 





Vocabulary Sections of Codperative English Tests 143 


sentence. Put the number of this initial letter in the parentheses 
at the right. Do not spend too much time on any one item; if 
you cannot think of the right word, go on to the next item.”’ 


The time limit for this part of the test is ten minutes. 

These instructions seem quite adequate. They are included 
here to give the reader additional information on the nature of 
these sections of the tests. 


THE SELECTION OF THE WORDS FOR THESE VOCABULARY TESTS 


It is stated in the leaflet containing information on the con- 
struction, interpretation, and use of these tests that all of the 
words used in the vocabulary sections were selected from the 
Thorndike Word Lists.' It is also stated that the words com- 
prising the recognition vocabulary represent a sampling of words 
from many subject-matter fields and are arranged in difficulty 
from easy to very hard words. It does not state how the diffi- 
culty level of the words was determined, except to say that the 
words were taken from the Thorndike Word Lists, which may be 
as adequate as other methods for determining word difficulty. 
We may, therefore, regard this vocabulary as a reading vocabulary 
to distinguish it from the active vocabulary of Test B-2. 

The leaflet further states that the active vocabulary of the 
expression test is composed of words taken from the Thorndike 
List, and were chosen not only with reference to the frequency 
of use but also with due consideration for obtaining an adequate 
description of each word in the test. We may, therefore, regard 
this vocabulary test as a device for measuring a person’s speaking 
and writing knowledge of words. At least, one may presume that 
the need to recall words anticipates an active use of them in 
speaking and correspondence. 


FREQUENCY OF USE—RELATIVE DIFFICULTY—OF WORDS AS PER 
THORNDIKE’S WORD MANUAL 


Tables I and II are made up of data based on the word count 
in the Thorndike Word Lists. 

The data of Table I correspond quite well with those of the 
table in the leaflet accompanying the tests. There is, however, a 





1 Thorndike, E. L.: The Teacher’s Word Book of Twenty Thousand Words. 
New York: Bureau of Publications, Teachers College, Columbia University, 
1932 (revised). 


fr 
: 


—, 





ee oe ee 


a 





144 The Journal of Educational Psychology 


slight discrepancy of one word at the 15th-thousand level and 
one at the 20th-thousand level and above. 


TaBLE ].—FReQquENCY LEVEL IN THOUSANDS OF THE WorDs 
UsED IN THE RECOGNITION VOCABULARY OF TEST C-2: 
READING COMPREHENSION 

FREQUENCY OF 


OcCURRENCE IN NUMBER OF 

THOUSANDS WorpDs 
i pe ee ee 24 
RR Dee eee eo ce wliad uic'n, sig wp anak on 1 
RDA et Cine Lhe ee a eae Ba eet 1 
te A td eee core Di Ce ee et 2 
at ee TE La Sd oe Ox mets eens ek bus 3 
EEE IE EE RE Bg. eae ee RPT ey. OE 4 
ee ee oe ke ee ek 4 
BE ey ep gene nar ee Le Ie 8 
ee a es Serene See ee 2 
Beta) Poot ae shoe pet eee ae ous 5 
Ree Cua ere So cg a ea ee 5 
PE RE En oie els ae ere orem 0 
er ee ae na ee nota a a he te Ate l 
eee es re 


The median difficulty of these sixty words occurs in the 16th 
thousand. There are twenty-four of them at the 20th-thousand 
level and above. That number is two-fifths of all the words in 
the test. However, in spite of this fact, the distribution of the 
scores of three hundred thirty-three freshman students on this 
part of the test, as revealed in Table III, is quite good. It 
apparently does not make a great deal of difference whether the 
distribution of these words on the basis of their frequency of use 
(relative difficulty) is symmetrical or otherwise, if one wishes to 
obtain a reasonably good distribution of the performance of a 
large number of students on the vocabulary section of this test. 
One is prompted, however, to ask why there are so many words 
included in this test from the upper levels of difficulty of the 
Thorndike Word Manual. 

The median difficulty of the forty-five words of Table II occurs 
at the 8th-thousand level. There are thirteen of them at the 





Vocabulary Sections of Coéperative English Tests 145 


TaBLE II.—FrRequEeNcy LEVEL IN THOUSANDS OF THE WORDS 
UsED IN THE ACTIVE VOCABULARY OF TeEsT C-2: 
EFFECTIVENESS OF EXPRESSION 
FREQUENCY OF 


OCCURRENCE IN NUMBER 

THOUSANDS oF WoRrpDs 
IS oT Le A OPE Rome Ras 0 
EES Cae. PRE EE ee ee 4 
niko aes e eka eew ed Oth ees Real Lacawe «beat 1 
es aka aire nee Oe ee 2 ed ee 0 
Re atest PCr. REPRE RTE, yes A 0 
ee ee eee ee eee een Bae 3 
ee pS eee ee ae en 0 
cas sa One edhe ade ama corn aaah mek See 0 
RAS Va Laleuawk ad sae ads Dae ee eenes 0 
ERE ee ee ls pea Pee carr ae UE ney er 1 
i tis ak Na a a La 2 
es BR aD at ae as be i ee 5 
a SI ete, ott, bl Bee eee ed ie A 7 
ET Cp Re REED» Tee hPa 6 
IRE, RT ee eke ek PO ee my a 3 
a ag Hs te ote Ar ae 13 
| EEE ee sc a a et soe en 45 


5th-thousand level and only eight of the 15th-thousand level and 
above. This distribution is the reverse of that for the recogni- 
tion vocabulary test. One may wish to raise the question: why 
are there so many words included in this vocabulary test from 
the lower levels of the Thorndike Word Manual? In all prob- 
ability it is desirable to have many more easy words in an active 
vocabulary test than in a reading (recognition) vocabulary test, 
since a person’s active vocabulary is always much more limited 
than his reading vocabulary. 


PERFORMANCE OF FRESHMAN ENGINEERING STUDENTS ON THE 
VOCABULARY SECTIONS AT THE HIGHER LEVELS OF FORM Q. 


Frequency Tables III and IV and the graphs in Figures 1 and 
2 are constructed from the records of several hundred freshman 
students who took the tests in the fall of the school year, 1942- 


4 
4 
Tad 








146 


The Journal of Educational Psychology 


1943. The two tables reveal how many words each student of a 
group of three hundred thirty-three freshmen answered correctly; 
whereas the graphs reveal how many times each word was 
answered correctly by two hundred freshman engineering 


students. 


TABLE III.—Tue DIstTRIBUTION OF THE PERFORMANCE OF 333 
FRESHMAN ENGINEERING STUDENTS ON THE VOCABULARY 
SEcTION oF Test C-2: READING COMPREHENSION 


NUMBER NUMBER NUMBER NUMBER 
Srv- Srv- Srv- Srv- 
DENTS DENTS DENTS DENTS 
Cor- Cor- Cor- Cor- 
RECTLY RECTLY RECTLY RECTLY 
REcoc- REcoG- REcoG- REcoG- 
NIZING NIZING NIZING NIZING 
Num- Worp Num- Worpv Num- Worpv Num- Worp 
BER OF MEAN- BEROF MEAN- BEROF MEAN- BEROF MEAN- 
Worps INGs Worps ines Worps 1INGs WORDs __INGS 
1 1 16 10 31 11 46 3 
2 0 17 11 32 5 47 1 
3 4 18 5 33 1 48 1 
4 0 19 18 34 12 49 2 
5 3 20 16 35 8 50 2 
6 1 21 9 36 2 51 1 
7 2 22 11 37 3 52 0 
8 3 23 23 38 7 53 2 
i] 3) 24 0 39 3 54 0 
10 12 25 13 40 5 55 0 
11 7 26 17 41 10 56 0 
12 8 27 6 42 4 57 0 
13 7 28 4 43 4 58 0 
14 1 29 7 44 1 59 0 
15 21 30 s 45 8 60 0 


Median 23 words Total: 333 


Table III shows that the median performance of three hundred 
thirty-three freshman engineering students on the recognition 
vocabulary test is twenty-three words out of a possible sixty. 
While this performance is not any too impressive, it is probably as 
good as can be expected, considering the fact that twenty-four 
of these sixty words were at the 20th-thousand level and better 
as determined by the word-count technique of the Thorndike 


Word Manual. 





Vocabulary Sections of Codperative English Tests 147 


Table IV shows that the median performance of the three 
hundred thirty-three students on the active vocabulary is 
thirteen words out of a possible forty-five. This is a very poor 
performance for a select group of college freshmen. Since the 
majority of the words in this particular vocabulary test are to be 


TaBLE I1V.—TuHE DISTRIBUTION OF THE PERFORMANCE OF 333 

FRESHMAN ENGINEERING STUDENTS ON THE VOCABULARY 
SecTION oF Trest B-2: EFFECTIVENESS OF EXPRESSION 

NUMBER NUMBER NUMBER 

STUDENTS STUDENTS STUDENTS 

Num- Correctty Num- Correctty Noum- CoRrRRECTLY 

BER OF RECALLING BER OF RECALLING BER OF RECALLING 
Worps Worps Worps Worps Worps Worps 


l 1 16 22 31 0 
2 1 17 16 32 0 
3 1 18 17 33 0 
4 4 19 17 34 0 
5 5 20 15 35 0 
6 13 21 8 36 0 
7 10 22 5 37 l 
8 19 23 2 38 0 
9 27 24 3 39 0 
10 28 25 2 40 0 
ll 19 26 3 41 0 
12 23 27 l 42 0 
13 18 28 l 43 0 
14 25 29 l 44 0 
15 24 30 l 45 0 
, poe 333 


Median—13 words 


found at the lower levels of the Thorndike Word List, it is rather 
doubtful whether their difficulty may be regarded as a causative 
factor in producing the poor performance. It is a good guess 
that both the time limit of ten minutes and the technique involved 
in answering the items were contributing factors to the poor 
performance of these college students on the active vocabulary 
section of the test. 








Number of Students Correctly Recognizing Words 


148 The Journal of Educational Psychology 


Figure 1 contains an analysis of the words whose meanings 
were correctly recognized by two hundred freshman students on 
the vocabulary section of Test C-2. The graph reveals that the 
difficulty of words in this vocabulary test is not any too well 
arranged to obtain a good performance of students for this popu- 
lation. Some of the words near the end of the test may be 
shiited to the middle and some at the middle may be shifted to 


210 
(95 
480 
165) 
(Se 
435 


(20 


- | 





























/ \ 


SST eT CTD TP Oe a a ee ON oo ee OT 
ford Numbers as Given in the Test 


Fic. 1.—Record of 200 freshman engineering students on the vocabulary section 
of the Coéperative English Test C-2. 








positions near the end. However, the general nature of the 
distribution is much better than that shown in Figure 2. 
Figure 2 contains an analysis of the words correctly recalled 
by the same two hundred freshman students on the active 
vocabulary. This distribution is not nearly as regular as the 
one for the words in the recognition vocabulary. Many of the 
words, especially those beyond the 28th, are either too difficult 
in themselves, or they are so involved in the technique of test 





Vocabulary Sections of Codperative English Tests 149 


organization that the ten-minute time limit is entirely too brief 
to obtain an adequate response on the test as a whole. Some 
shifting of the words in this test also seems necessary. A few 
may be placed in positions nearer the end of the test than they 
are. Incidentally, these same two hundred students are included 
in the larger group of three hundred thirty-three freshman 


“Ag 


/30 











7e 


ée 














Number of Students Correctly Recalling Words 











«@ \ 
* ‘a + ° 


Word Numbers as Given in the Tests 
Fig. 2.—Record of 200 freshman engineering students on the active vocabulary 


section of the Codperative English Test B-2. 
students whose records were used in the previous Tables III 
and IV. 


THE RELATIONSHIP OF THE TWO VOCABULARY SECTIONS (AT THE 
HIGHER LEVELS) WITH EACH OTHER AND THE CARNEGIE 
MENTAL ABILITY TEST 


The correlations of the two vocabulary sections of the Codpera- 
tive English Test with each other and the Carnegie Mental Ability 
Test are given in Table V. 

















150 The Journal of Educational Psychology 


TABLE V.—COEFFICIENTS OF CORRELATION BETWEEN THE 
RECOGNITION AND THE ACTIVE VOCABULARY SECTIONS AND 
THE CARNEGIE MENTAL ABILITY TEST 


RECOGNITION ACTIVE 
VoOcaABULARY TEST VOCABULARY TEST 
Mental Ability Test... .. .78 +.04 61 +.03 
Recognition Vocabulary 
ee ee ee rere .62 +.03 


The recognition and active vocabulary tests correlate quite 
well with each other, as may be seen from the table. However, 
the recognition vocabulary test correlates much higher with the 
mental ability test than does the active vocabulary test. This 
fact is not to be expected since it is much more difficult to obtain 
and use an active vocabulary than it is to obtain and use a 
reading vocabulary. That point may be disputable, however. 


COMMENTS 


The authors say that they selected the words of the vocabulary 
sections of the Codperative English Test from the Thorndike Word 
Lists, or from the Thorndike Word Manual of Twenty Thousand 
Words. That fact, doubtless, gives some validity to the words 
used in the tests, but it does not validate the selection of these 
words. The frequency of use, or difficulty of these words does not 
conform to a symmetrical distribution. In fact, the distribu- 
tions of these words according to their frequency of use in read- 
ing as determined by the Thorndike technique are skewed. The 
words comprising the recognition vocabulary form a frequency 
distribution that is skewed to the left, whereas the words com- 
prising the active vocabulary form a frequency distribution 
that is skewed to the right. Oddly enough, however, the words 
forming the skewed distribution in Table I conform quite well 
to a symmetrical distribution in Table III. This may indicate 
that there is little, if any, advantage in selecting a symmetrical 
distribution of words according to their frequency of use in 
reading, in order to obtain a reasonably good distribution of 
responses from students on these words, when recognition is the 
ability involved. 

The words forming the skewed distribution in Table II also 
form a skewed distribution in the same direction in Table IV. 
These two frequency distributions correspond quite well, 





Vocabulary Sections of Codperative English Tests 1651 


primarily because most of the easiest words were at the beginning 
of the test. The implication is that the test is not a good measure 
of the freshman students’ active use of words. It is quite pos- 
sible that either the ten-minute time limit is not sufficiently long 
or that the technique of answering the test is too involved to 
obtain an adequate response by freshman students. The data in 
Table VI tend to support these facts. 

There may be some doubt as to the advantage of selecting the 
words for the active vocabulary from the Thorndike Word Manual 
since it is composed of words obtained from reading materials of 
diverse nature. Usually an active vocabulary is active because 
people use it a great deal in speaking and in writing. If this is 
a correct assumption, then other word lists, in addition to the 
Thorndike List, may have been consulted in developing the active 
vocabulary. 

As has already been mentioned, the distribution in the graph in 
Figure 2 shows that the active vocabulary test is not a good 
measure of the words which college freshman students of this 
population use. In fact, the record of two hundred freshmen 
on this particular test is poor, and especially poor on the last 
sixteen words of the test. There is evidence that this condition 
is due, in part, to the puzzling technique of answering the test 
items of the active vocabulary test. 

The performance of the students is better in the recognition 
vocabulary test than it is in the active vocabulary test, because 
the technique of answering the test items in the recognition 
vocabulary test is much simpler than that of answering the test 
items in the active vocabulary test. Of course, the time limit 
in the reading vocabulary test is five minutes longer than that of 
the active vocabulary test, but the words included in the former 
test are, on the whole, more difficult than those included in the 
latter test. The time limit for doing the active vocabulary test 
should be increased, and the technique for answering the test 
items should be simplified if possible. 

There is no point to be made of the fact that the recognition 
and active vocabularies of the test do not correlate higher than 
.62. However, one may wonder why the active vocabulary cor- 
relates .61 with the mental ability test, whereas the recognition 
vocabulary correlates .78 with the same mental test. Evidently 
the recognition type of vocabulary is a better indication of one’s 
mental ability than is the active vocabulary. 





































COMPARISON OF METHODS OF CALCULATING 
MENTAL AGE EQUIVALENTS 


ALEXANDER J. PHILLIPS 


Department of Educational Research, University of Toronto 


‘Mental-Age’ is but one of the numerous ways of expressing 
the intelligence rating of an individual. As a measure of intel- 
ligence, it is undoubtedly due to Binet, although the comparison 
of adults with regard to knowledge or ability had often been made 
previous to his time. It was Binet, however, who showed, in a 
scientific manner, that it was possible to determine the mental 
level of an individual. He also showed that it was possible to 
compare this level with a normal one and consequently deter- 
mine by how many years a child was advanced or retarded. 

Thomson! discussed the question of whether mental age is 
something already existing or something which is to be defined. 
He concluded that there is certainly something already existing 
which we wish to measure—we assume that that something is a 
unique function of the score. We know that we can make 
chronological age and average score have a one-to-one relation- 
ship provided we select small and varied items in constructing a 
test. In order to make comparisons, one test with another, we 
simply report mental age instead of score. 

Mental age, then, is a definite function of score. If it is a 
simple linear function, then it is simply another name for score 
with a change in the unit used and a shift in the zero point. We 
might define it as the performance (7.e. score) of an individual 
expressed in terms of age. 

The determination of mental-age equivalents for raw scores 
leads to a consideration of the different procedures which might 
be employed. Raw scores give mental ages directly if the test 
in question has been previously standardized over different 
chronological-age levels. This was the method used by Binet 
in determining the mental-age equivalent for a specific raw 
score. Analagous to this is the method employed in the 
National Intelligence Tests. Here, the average scores have been 
computed from the results of 32,372 cases in some nineteen com- 
munities. ‘These average scores constitute ‘norms’ in that they 
show what may be expected of the typical or average child of a 


given age. The given age applies to pupils who have reached 
152 


Ry Parc? ae dele ao 


Fe Sb? Sees nd ckeale, ate aaa kas 


PUI ee ee ee 


Methods of Calculating Mental Age Equivalents 153 


that age but not the next birthday; e.g. age 10 refers to all those of 
10 years, 0 months, through 10 years, 11 months, inclusive. In 
passing, it should be noted that this is not quite in accord with 
the Binet Scale (Stanford Revision).* Here age differences were 
confined to children within two months of a birthday. With the 
norms calculated, a raw score on the National Intelligence Test 
may be readily converted to a mental age by interpolation. 

An entirely different procedure for determining mental-age 
equivalents for raw scores has been formulated by Jackson.* He 
has used the method in calculating mental-age equivalents for 
scores made on the Primary Dominion Tests of Intelligence. In 
the following presentation of Jackson’s method, the author has 
selected the results of two hundred sixty-five representative 
cases for Form C of this test. That the resulting IQ scores 
made by this sample are normally distributed, may be seen from 
the data given in Table I. 


TaBLE I.—DiIstTRIBUTION oF IQ Scores 
Form C of Primary Dominion Test of Intelligence—265 Cases 
IQ ScorE FREQUENCY 


150-159 2 
140-149 4 
130-139 18 
120-129 46 
110-119 56 
100-109 64 
90-— 99 38 
80-— 89 21 
70— 79 8 
60— 69 7 
50— 59 l 


Because of the simple linear relationship between score and 
mental age, we first calculate the IQ” for each case. This is 


done as in the usual calculation of IQ, but score is now sub- 


; S 
stituted for mental age, 2.e., IQ”” = a < 100. Having deter- 


mined the IQ’’, we have the three values; score, chronological 
age and IQ” for each child. These are shown in the form of 
frequency distributions in Table II. 














154 The Journal of Educational Psychology 


TaBLE IJ.—FREQUENCY DISTRIBUTION OF ScoRE, CHRONO- 
LOGICAL AGE AND IQ” From DOMINION PRIMARY 
INTELLIGENCE TEsST—Form C 























Score Chronological Age IQ” 

ClassInterval; f /ClassInterval) jf /{ClassInterval) f 
57-61 1 8-3-8-5 3 70-75 4 
52-56 12 8-0—8-2 0 64-69 13 
47-51 23 7-9-7-11 12 58-63 46 
42-46 68 7-6-7-8 6 52-57 51 
37-41 56 7-3-7-5 13 46-51 58 
32-36 57 7-0-7-2 23 40-45 45 
27-31 22 6-9-6-11 32 34-39 22 
22-26 10 6-6-6-8 57 28-33 9 
17-21 8 6-3-6-5 67 22-27 i) 
12-16 5 6-0-6-2 47 16-21 5 
7-11 3 5-9-5-11 5 10-15 3 








The next step is to assume a theoretical model in which the 
intelligence quotients are normally distributed about a selected 
mean (109) with a standard deviation of 16. The choice of the 
mean is based upon previous knowledge of the intelligence ratings 
in the area from which the two hundred sixty-five cases were 
obtained. The standard deviation of 16 is selected to give a 
distribution with a particular spread, although any value 
between 14 and 18 would probably prove suitable. The per- 
centiles for the model and for the results on Form C are now 
calculated and equated.* The model gives the ‘normal’ IQ 
scores for each percentile, designated thus, IQ’. Equating per- 
centiles determines the corresponding IQ’ for each IQ”. 





* The calculation of percentiles for the model requires the use of a table 
of the probability integral (see Tables for Statisticians and Biometricians, 
Table I, by Karl Pearson). Example: Po, in column I, gives a ‘z’ value 
2.3263 in column II. 


- 3 4 hence, 2.3263 = -—8 


Solving for X, we have X = 146 which is IQ’. 


6.9 os 











Methods of Calculating Mental Age Equivalents 155 


We are now in a position to transform the normal IQ scores 
CA X IQ’ 


(IQ’) into mental ages (MA’), thus, MA’ = 100 (See 
Table III.) Some smoothing of the MA’ is required and may 
be best accomplished graphically. This smoothing process 
leads to mental ages from MA’. 





TaBLeE II].—MA’ CatcuLaTep From [Q’ 


ScoRE CA IQ” IQ’ MA’ 
18 6-1 25 83 60 
36 7-2 42 99 85 
40 7-1 47 104 88 
47 7-2 55 117 100 
52 7-3 60 125 109 


Score, then, gives a mental-age equivalent. It should be men- 
tioned that too many low scores in the distribution will so distort 
the relationship that end-values may be in error. This might 
be termed a possible weakness of the method. 

An alternative method of calculating mental-age equivalents 
is by comparison with another test or other tests. In the follow- 
ing discussion, the National Intelligence Test and Forms A and B 
of the Junior Dominion Intelligence Tests have been used. Since 
these are but illustrative examples, only a small number of cases 
has been considered. Both or all tests were administered to the 
same group of children and different statistical methods applied 
to the results. As a basis of comparison, three cases were con- 
sidered. First, high relationship between the scores on two 
tests (0.937); second, medium relationship (0.764); and, third, 
low relationship (0.472). In each of the three cases, the follow- 
ing statistical procedures were employed in obtaining ‘pre- 
dicted’ mental ages: 

(A) Using the regression line Y(x) = a + bX, 1.e., the line for 
predicting values of Y from known values of X. 

(B) Using the regression line Xiy) = a’ + b’Y, 2.e., the line for 
predicting values of X from known values of Y. 

(C) Bisecting the angle between the regression lines using raw 


scores. 
(D) Bisecting the angle between the regression lines using 


standard Scores. 








156 The Journal of Educational Psychology 


(#) Using Otis’ Line of Relation. 

(F) Equating percentiles. 

Five representative instances of predicted scores have been 
reported for each case discussed. These are to be found in 
Tables IV, V and VI. 


TABLE IV.—CoMPARISON OF DIFFERENT TYPES OF PREDICTED 
ScorES ON Two Forms OF THE SAME TEST 











Observed Observed 
Scores Predicted Scores Scores 
on on 
Form A Form B 
A B ririnri#sis B 
14 14 ll 13 12 ° 16 18 
31 28 27 28 27 27 25 
43 38 38 38 38 35 41 
55 48 49 49 49 48 49 
65 56 59 57 58 61 59 


























* Refer to predicted scores using Otis’ ‘Line of Relation’ and differ from 
A, or B,™ at most only one point in the units digit. 


Case I 


This case is a study of the scores of thirty-seven pupils on 
Forms A and B of the Junior Dominion Intelligence Test. The 
relationship between the scores on each form was 0.937. 


(A) Using the regression line Y:(x) = a + bX 


The results shown in Column II of Table IV were derived by 
the use of this line. The regression coefficient was first calculated 











me (2A)( ZB) 

ree STAB N 
(ZA)? ’ 

ZAP — = 


where 2A = sum of scores on test A 
>A? = sum of squares of scores on test A 





Methods of Calculating Mental Age Equivalents 157 


>B = sum of scores on test B 
2AB = sum of products of scores on tests A and B 
(2A)? = square of sum of scores on test A 
N = number of cases 
then the regression equation may be written 


(B = Mz — bM, + bA) 


where B = estimated score on test B 
Ms = mean score on test B 
M., = mean score on test A 
b = the regression coefficient 
A = score on test A 
This equation gave the estimated score (B) from the score on 
form A of the test. 


(B) Using the regression line Xy = a’ + b/Y 


The use of this line gave results very closely related to those 
from the line Yx = a + bX, as shown in Column III of Table IV. 
It should be noted, however, that the use of this line necessitates 
a change in the regression equation. We must now consider 


A= MA -—-0'M,;,+0'B 


where A= estimated score on form A 

b’ = the new regression coefficient 

B = score on form B 
and the other quantities are as defined earlier. We are con- 
cerned, however, in predicting B and not A; hence, we must 
solve this equation for B. The result is 


ce _ 1M, , 14 
B — Ms b’ + b’ 
where B' is the predicted value of B using the regression line 
Xy=a’'+d'Y 


Again, the other quantities are as defined previously. 


(C) Bisecting the angle between the regression lines using raw 
scores. 


Before bisecting the angle between the regression lines using 
raw scores, it is necessary to calculate the angle which each line 








158 The Journal of Educational Psychology 


makes with the X axis. This follows directly from the previously 
determined slopes of the two lines as given by the regression 
coefficients. Tables of natural tangents give the angle cor- 
responding to any known slope. We have previously calculated 
the slopes of the lines Yx = a + bX, (0.8169) and Xy = a’ + b’Y,7 
(1.0749). Note, however, that the value 1.0749 is the slope of 
the line Xy = a’ + b’Y with relation to the Y axis. To transfer 
the relation to the X axis, we must consider the reciprocal value, 
viz. 0.9303. The calculation necessary to bisect the angle 
between these two lines is shown. 


b = 0.8169 = 39° 15’ 
b’ = 0.9303 = 42° 56’ 


Adding one half of the difference between these two angles to the 
smaller, we determine the slope (b’’) of the line which bisects the 
angle between the two regression lines. 


b” = 41° 05’ = 0.8717 


Applying this regression coefficient, we estimate the values of 
B from the known values of A. In this case, the estimated 
values are designated B" and are shown in Table IV. It will be 
noted that little variation results from the use of this method. 
This was the procedure employed in calculating a table of 
tentative mental age equivalents for the Revised Beta Examina- 
tion scores.5 This provisional table was derived by a comparison 
of the scores of three hundred sixty-four cases on the Revised 
Beta and the Otis Self-Administering Tests of Mental Ability 
(Higher Examination). The correlation between these tests 
was 0.71 + 0.02. The Otis score corresponding to the Revised 
Beta score was read from the midline bisecting the angle between 
the two regression lines. The Binet mental age corresponding 
to each Otis score was then obtained from an interpretation 
chart given in Otis’ Manual of Directions. 


(D) Bisecting the angle between the regression lines using 
standard scores. 


Substituting standard scores for raw scores, the equation of 
the line bisecting the regression lines in standard form becomes 


Be — M,_A-M, 
Sp ie Sa 








Methods of Calculating Mental Age Equivalents 159 


where B," = the estimated value of B 
S, = the standard deviation of A 
Sz = the standard deviation of B 
A = score on Form A 
and other quantities are as previously defined. 


(£E) Using Otis’ Line of Relation. 


Otis® has shown that if the number of cases upon which the 
comparison of two tests is to be based is less than fifty, it is 
impossible to calculate an accurate correspondence. Should a 
first approximation be required, Otis suggests the comparison of 
scores of the same rank. This rough method leads to difficulty 
as some smoothing is necessary, e.g. ‘splitting the difference’ 
when a score on one test corresponds to two or more different 
scores on the other. It is possible, however, to determine the 
general trend of the correspondence by this rough method and to 
draw a straight line in such a position as best to represent this 
general trend. This straight line is termed by Otis the ‘line of 
relation’ between scores of the two tests and, from it, the score 
on one test corresponding to any given score on the other may be 
read. In an article’ discussing the reliability of the 1915 edition 
of the Stanford Revision of the Beta Scale, Otis offers a more 
detailed discussion of his ‘line of relation.’ Since one test only 
was considered in this study, the scale was divided into two 
parts, the first half of the tests of each age group were placed in 
Scale A and the second half in Scale B. Scores on Scale A were 
then plotted on the X axis of a graph and scores on Scale B 
on the Y axis. The line which resulted was called by Otis a 
‘single’ relation line to distinguish it from a regression line of 
which there are two for any pair of variables. The method of 
drawing such a line is based upon the assumptions that the 
medians and upper and lower quartiles of each distribution most 
probably correspond, and the ranks of a specific value are most 
probably the same in either distribution. Since the line seemed 
to show no marked curvature, it was assumed to be rectilinear 
and presumably approximated the line 


oy 
y a z. 














160 The Journal of Educational Psychology 


The statement made by Otis, ‘‘that the equation of the line 
which most probably expresses the true relationship between 


zandyisy = ce x’ has been challenged. Statisticians contend 


that the regression line y = ri oe x expresses the true relation- 


ship between xz and y. Otis® has published a proof of his state- 
ment. He has shown that the equation of the line representing 
true correspondence between scores on two tests, X and Y, 


: se Tyy TY 
measuring the same or identical traits, is y = wad =e where 
zz 


Ty, and r,z are the reliability coefficients of the variables y and z, 
respectively. When dealing with two forms of the same test, 
the presumption is that one form is equally as reliable as the 


. r . 
other, hence the assumption rz. = ry, and 4 Pa becomes unity. 
zz 


Tyy TY 


The foregoing assumption reduces the equation y = ae 


toy = Ee, which is Otis’ equation of the line most probably 


representing the true correspondence between scores on two 
forms of a test. The situation in which two different tests are 
used does not apply to Case I and will be reserved for Case II. 
In using Otis’ ‘Line of Relation’ to predict mental-age equiva- 
lents for raw scores, it is to be noted that the values of all vari- 
ables are measured from their respective means. Our regression 
equation becomes B’” = A+ M; — 3, M, where B denotes 
the estimated value of B using the line of relation and the other 
quantities as defined previously. This regression equation 
becomes identical with that for the bisection of the angle between 
the regression lines in standard score, hence, B,“* = B in this 
case. 


(F) Equating percentiles. 


The final statistical procedure which might be applied to our 
problem is that of equating percentiles. Since we are concerned 
with a small sample (thirty-seven), we need only rank the papers 
in order of magnitude and equate them. Some rounding-off is 











Methods of Calculating Mental Age Equivalents 161 


necessary when two or more scores on Form B correspond to a 
single score on Form A. This leads to variations from the 
predicted scores found by the other methods. It will be noted, 
such variations are slight when the relationship between the 
scores on the two tests is high, as it is in Case I. Examination 
of Table IV would lead to the conclusion that there is little to 
choose between any of the six statistical procedures. The 
equating of percentiles (B") seems to agree favourably with the 
results of the other methods, and, because of the ease of cal- 
culation, might be considered a first choice. Certainly, when 
the relationship between scores is high, it seems to matter little 
which particular statistical procedure is chosen. 


Case II 


We are concerned in Case II with the scores of thirty-seven 
pupils on Form B of the Junior Dominion Intelligence Test and 
the National Intelligence Test. The relationship between these 
two sets of scores was 0.764. Asin Case I, the different statistical 
procedures were applied, but now slight variations are to be 
noted. The prediction of B scores, using the method of bisecting 
the angle between the regression lines, is the same whether raw or 
standard scores are used. The standard deviations of these two 
distributions must vary greatly before any noticeable change 
results from using raw or ‘z’ scores. An illustration of this is 
to be found in a comparison of the Advanced and Intermediate 
Forms of the Dominion Tests of Intelligence; the standard devia- 
tion of the former being 12, and of the latter 82. Here the ratio 


is approximately 1:7. Let us assume the correlation between the 


two to be 0.6. In this case, brs. = 0.6 X eS = 4.1 where By. 


is the regression line Intermediate on Advanced. Again bax. 


= 0.6 X as = 0.088, but with reference to the horizontal axis, 
1 — 
the reciprocal value, 7.e. 0.088 °F 11.3 must be used. Bisecting 


the angle between these two regression lines, the slope of the 
mid-line (b’) is found to be 6.08. We have, therefore, the two 
slopes b = 7.0 (approximately) and b’ = 6.08. It will be seen 
that when the ratio between the standard deviations is great, 
the difference between the slopes of the two regression lines is 
sufficient to affect the prediction using raw and standard scores. 











162 The Journal of Educational Psychology 


The prediction of B scores using Otis’ ‘Line of Relation’ now 
refers to two distinctly different tests and not to two forms of the 
same test as considered in Case I. Otis® has shown that when 
tests do not measure identical traits, 7.e., rx, < 1.00, the formula 
for finding the true correspondence between scores on two tests is 
y= fz oe x.* The use of this formula gives results practically 
identical (at most differing only by one in the units digit) with 
those determined by bisecting the angle in standard scores. The 
reason, of course, is found in the fact that there is generally little 
difference between the reliabilities of two different tests. Should 
there be such a difference, the extraction of the square root brings 


the factor J om close to unity. 


From the results shown for this case in Table V, it will be seen 
that B and B’ begin to show marked divergence. Especially is 


TABLE V.—COMPARISON OF DIFFERENT TYPES OF PREDICTED 
Scores ON Form B or JuNIOR DOMINION AND NATIONAL 
INTELLIGENCE TESTS 





Observed Observed 
Scores Predicted Scores Scores on 
on NIT Form B 





N B B' B" _” BY RY B 








42 11 2 7 7 * 8 16 
58 | 20 15 18 18 19 19 
75 30 29 30 30 29 26 
92 40 43 42 42 44 30 
112 52 60 55 55 52 47 























* Refer to predicted scores using Otis’ ‘Line of Relation’ and differ from 
A," or B," at most only one point in the units digit. 


this true at the two extremes of the distribution of National 
Intelligence Test scores. The method of equating percentiles 





* Actually, h = Tuy oY g; see *, p. 543, equations (49) (50). Thesymbols 


Tre ox 
‘h’ and ‘g’ denote factors common to the two tests X and Y. 








Methods of Calculating Mental Age Equivalents 163 


seems to give predicted scores closely related to those resulting 
from the use of other procedures. This would lead one to favor 
this method for its relative accuracy and ease of calculation. 


Case III 


The results shown in Table VI are based upon the scores 
obtained by thirty-five pupils on the National Intelligence Test 
and Form A of the Junior Dominion Intelligence Test. The 
relationship between the scores on these two tests was 0.472. 
It will be seen that the predicted scores show greater variation 
than in the two cases previously considered. Here impossible 
results may appear when the second regression line (A’) is used 
for prediction. This statistical procedure, then, must be dis- 
carded. A slight variation now appears between A’ and A,"; no 
longer are they identical as in Case II. This variation would be 
more pronounced with high and low National Intelligence Test 
scores, so that either method would lead to discrepancies at the 
extremes of the National Intelligence Test scores. The equation 
of percentiles, however, gives results which, in most cases, differ 
little from the other A values. Because of this consistency and 
ease of calculation, it would seem to be as suitable a method as 
any for prediction purposes. 


TABLE VI.—CoMPARISON OF DIFFERENT TYPES OF PREDICTED 
Scores ON Form A oF JUNIOR DOMINION AND NATIONAL 
INTELLIGENCE TESTS 











Observed Observed 
Scores Predicted Scores Scores on 
on NIT Form A 
N A A' A" F ity A” A’ A 
64 44 0 28 30 ° 30 21 
77 49 21 39 40 43 42 
98 57 56 56 57 58 28 
117 64 87 72 72 70 77 
132 69 112 84 84 77 62 


























* Refer to predicted scores using Otis’ ‘Line of Relation’ and differ from 
A,"* or B, at most only one point in the units digit. 














The Journal of Educational Psychology 


CONCLUSIONS 


Examples have been given to illustrate some of the different 
possible statistical procedures which might be employed in 
obtaining mental age equivalents, either directly, from a model 
IQ distribution, or from a knowledge of scores on a test which 
has already been standardized. In connection with the latter 
method, examples were specially selected to cover different 
degrees of relationship between scores. It is felt that the results 
show such little variation for high relationship between scores 
that any one of the methods might be employed. Lower rela- 
tionships tend to cause variations in predicted scores. Equating 
percentiles in each case, however, yields results in fair agreement 
with those obtained by using the other methods. Because of its 
simplicity, it would appear to be a fairly suitable method of 
predicting raw scores preparatory to calculating mental-age 
equivalents. In using this method, the standard deviations of 
the two distributions of mental ages are, of course, made the same. 
This is not always the case when we use the regression lines, or a 
line bisecting the angle between them. 

This discussion of the various possible methods of obtaining 
metal-age equivalents or norms for a test raises several interesting 
questions. If mental age is a fundamental concept, and, hence, 
the IQ rating a derived, then presumably one should obtain the | 
mental-age norms directly. But which of the possible experi- 
mental or statistical methods should be employed? If, on the 
other hand, the IQ rating is assumed to be a fundamental 
measure of an individual and of a defined group of individuals, 
then presumably the mental age is a derived measure, 1.e., it is a 
means by which we can obtain IQ’s, not an end in itself. In this 
case, the mental ages will presumably be obtained indirectly, 
following some such method as that proposed by Jackson. 

These and other relevant questions can be answered if, and 
only if, psychologists and educationists define exactly what is 
meant by the terms ‘mental age’ and ‘intelligence quotient.’ 
It is suggested that this should be done as soon as possible. As 
matters now stand, neither the mental ages nor the intelligence 
quotients obtained from apparently equivalent tests are neces- 
sarily directly comparable. These mental tests are designed to 
aid in the understanding of individuals, but, unless an unequivocal 











Methods of Calculating Mental Age Equivalents 165 


interpretation of the results can be given, little help can be 
obtained from them. 


REFERENCES 


(1) Godfrey H. Thomson: ‘‘The Mental Age Concept and the 
Standardization of Group Tests.” Psychological Review, 1928, Vol. 
xxxv, p. 406. 

(2) Peter Sandiford: Foundations of Educational Psychology. New 
York: Longmans, Green and Company, 1938, pp. 354-356. 

(3) Lewis M. Terman: The Measurement of Intelligence. Boston: 
Houghton-Miffin Company, 1916, pp. 52-53. 

(4) Robert W. B. Jackson: Department of Educational Research, 
University of Toronto, Canada. 

(5) C. E. Kellogg and N. W. Morton: “‘ Revised Beta Examination.” 
Personnel Journal, 1934, Vol. x111, No. 2, pp. 94-100. 

(6) Arthur S. Otis: Statistical Method in Educational Measurement. 
Yonkers-on-Hudson: World Book Company, 1923, pp. 102-105. 

(7) : “The Reliability of the Binet Scale and of Pedagogical 
Scales.” Journal of Educational Research, 1921, tv, pp. 124-128. 

(8) : “The Method of Finding the Correspondence between 
Scores on Two Tests.”’ Journal of Educational Psychology, 1922, Vol. 
XIlI, pp. 529-544. 

(9) ————: loc. cit., pp. 541-544. 











; 
‘ 








THE TRANSFER EFFECTS OF A HOW-TO-STUDY 
COURSE UPON DIFFERENT IQ LEVELS AND 
VARIOUS ACADEMIC SUBJECTS 


SALVATORE G. DIMICHAEL 


Fordham University 


In another article,* the writer presented data which demon- 
strated that the how-to-study course as taught in the experiment 
could be expected to increase substantially the knowledge of 
efficient study skills of ninth-grade pupils. It was found, over 
the period of one term, that students of average mental ability 
did not acquire more knowledge about better study techniques as 
a by-product of the regular subject-matter classes. Students of 
superior mental ability did obtain such knowledge in an inci- 
dental way. On the other hand, students who were given the 
special how-to-study course learned significantly more about 
effective study techniques than matched pupils. These increases 
can be termed the direct results of a how-to-study course, and the 
data proved the direct effects of such training to be appreciable. 

However, the crucial test of the efficacy of the course in how- 
to-study must lie in its ability to bring about higher accomplish- 
ment in the scholastic achievement of the pupils. This refers 
to the transfer effects of the course. By transfer is meant the 
extension and application of what was learned in the how-to- 
study classes to other subjects in the curriculum. The transfer 
effects which may have carried over to other subjects would 
include ideas, understandings, ideals, attitudes, techniques, and 
the like. In this follow-up report of the same experiment some of 
the possible transfer effects of the special instruction will be 
inferred from results on achievement tests in the regular subjects 
of the high school. 

The broad purpose of the present part of the investigation is to 
evaluate experimentally the transfer effects of a how-to-study 
course upon the achievement of pupils of different levels of intelli- 
gence and upon achievement in several academic subjects. The 
more specific questions which this study seeks to answer are: 

(1) Will the increased knowledge of efficient study skills be 
translated into actual practice and be observed objectively in 


superior achievement scores? 
166 








Transfer Effects of a How-to-study Course 167 


(2) Which level of intelligence will derive the most benefit 
from the instruction given in such a course? | 

(3) In what academic subject—history, Latin, or algebra— 
is the greatest amount of transfer observed? 

The ninth grade, in which this experiment was conducted, 
has seemed to be an especially promising level to teach study 
skills. Educators generally believe that the “ .. . art of study 
ought in theory to be acquired as early in the student’s career as 
possible... ” (8, p. 2). Most previous experiments have 
inferred that the how-to-study classes met the test of usefulness. 
However, each of these studies was deficient in one or more of the 
following respects: pupils in the groups were not closely matched 
on significant factors inherent in the experimental situation such 
as mental age, IQ, chronological age, curriculum, and school 
year; more important, teacher efficiency was not controlled; the 
number of cases were in some studies too small for reliability; 
objective, standardized tests were not used to measure the 
amount of transfer; the results were not related to different 
levels of intelligence, nor were results related to different sub- 
jects in the curriculum. The present study has sought to avoid 
these shortcomings. 

The investigation included one hundred ninety-two cases in a 
matched-group, control-type experiment. The subjects were 
matched individually on the basis of seven criteria: IQ, MA, 
chronological age, sex, school year, curriculum, and teacher. 
On the basis of these criteria two groups were formed, one of 
one hundred two and the other of ninety subjects. The first was 
called the ‘Superior’ group, composed of those students whose IQ 
ranked them above the median of the class; the second was 
termed the ‘Average’ group (because their IQ’s placed them in 
the average range according to IQ classifications) and was made 
up of those students who ranked below the median of their class 
in mental ability. The mean IQ of the Superior group was 112.4, 
and of the Average group, 97.5. Table I contains the information 
pertinent to the comparability and description of the groups. It 
can be seen that a very close matching has been accomplished. 

The IQ’s of the pupils had been obtained as a routine school 
procedure just two months before the experiment began. The 
Otis Self-Administering Test, Higher Examination: Form A, was 
given. It has a reported reliability of .917. The Codédperative 














168 The Journal of Educational Psychology 


Medieval History Test, Provisional Form 1935, was used for 
that subject; it has a reliability of .94 as determined by the 
Spearman-Brown formula. For the subject of Latin, the 
Codéperative Latin Test, Revised Series, Form Q, with a reliability 
of .95 was employed as the initial test, and Form R was used as 
the final test. The Coéperative Algebra Test, Elementary 
Algebra Through Quadratics, Form Q, was given for the pre-test, 
and Form R, was administered at the close of the experiment. 
Their reliability is .86 at the single grade level. 


TABLE I.—CoOMPARABILITY AND DESCRIPTION OF GROUPS 


Aver. SD AGE, SD 

GROUP N IQ IQ Yrs.—Mos. Mos. 
Superior 

rere 51 112.84 6.10 14—1 6.00 

Tree 51 112.02 5.68 14—2 6.44 
Average 

ee 45 96.96 4.86 14—6 8.92 

ee is yee 45 98.00 4.79 14—6 8.48 


The instruction in how-to-study was given to the experimental 
students in two forty-five-minute periods each week throughout 
the Spring term of 1942, totaling twenty-seven class sessions. 
The instructor frequently made use of the lecture method and 
also other methods when they were deemed appropriate to most 
effective learning; even blackboard games were utilized to 
heighten class interest. The instructor conducted the class 
informally, and occasionally spoke with the students outside of 
class about their school work, attitudes, ambitions, and the like. 
No outside preparation was required for this non-credit course. 

The experimenter took a broad point of view on the concept of 
study in his instructional efforts. Emphasis was laid upon study 
and study techniques as necessary not only for school but for 
present and future life activities as well. Study was always con- 
sidered as a real life activity of value and importance, to be used 
in factory, home, community, and school. This broad view of 
study was purposely adopted because the investigator con- 
sidered it to be a good medium for obtaining maximum transfer. 
The emphasis in the teaching was on the development of a 
clearer understanding of, and a desire to use, the superior study 
skills. 








Transfer Effects of a How-to-study Course 169 


Pupils of superior and average intelligence were instructed 
similarly in the how-to-study class. While the experimental 
group was given the special instruction, the control group 
remained in their regular ‘study’ class to work on daily class 
assignments. After the special instruction the experimental 
group returned to their regular class where similar influences 
acted upon both groups alike. The investigator wishes to call 
attention to the fact that this experimental set-up has made it 
possible to control such variables as teacher efficiency, class size, 
equipment, books, and assignments. 

In planning the units of the course, the experimenter used the 
results of the study by Laycock and Russell'! as an objective 
basis for validating the contents of the course. The purpose was 
to provide instruction both in those topics most frequently 
mentioned in how-to-study manuals for secondary-school pupils 
and also suited to the needs of the particular students. The 
following twelve units were explained in class: the effect of 
attitudes upon scholarly efficiency; concentration and the con- 
trol of attention; planning a time schedule; increasing speed of 
reading; increasing comprehension; increasing vocabulary; out- 
lining; notebooks and note-taking; writing a term paper, theme, 
or report; remembering and memorizing more efficiently; 
reviewing; and preparing for and taking examinations. 

The ‘t’ ratios were derived by the method credited to Student 
and explained by Ezekiel (°, p. 449). These ratios may be 
interpreted in terms of the null hypothesis and levels of con- 
fidence ('*, p. 52). When the ‘t’ ratio lies beyond the five-per 
cent-level of confidence one may be reasonably certain that the 
difference is not due to chance alone; at the one-per-cent-level 
one may be practically certain of a true difference in the com- 
pared statistics. The per cents refer to the number of chances in 
one hundred that a greater difference will be obtained by mere 
chance sampling. 


RESULTS 


When the transfer effects of the course were determined for 
the groups differentiated by IQ and MA into Superior and 
Average, only one gain was observed to be statistically significant. 
This gain was made by the Superior experimental group in the 
subject of history which showed a superiority of 5.74 points over 














170 The Journal of Educational Psychology 


its control group. This improvement yields a ‘t’ ratio of 2.72 
which is significant at the one-per-cent-level, so that there is 
practical certainty that the difference is not due to chance 
sampling. This same Superior group exhibited a very slight, 
insignificant increase over the matched group in the subjects of 
Latin and elementary algebra. The Average experimental group 
showed slight, insignificant losses in history and algebra, but a 
gain over the control group in Latin which resulted in a ‘t’ ratio 
found beyond the twenty-per-cent-level of confidence. Thus, the 
transfer effects of the how-to-study course were found objectively 
to have affected favorably the history achievement of pupils of 
superior intelligence. There was some positive transfer effect 
also upon the Latin achievement of the Average group, but this 
was not statistically reliable. In algebra the noticeable differ- 
ences in achievement could be attributed to mere chance sampling. 


TaBLE IJ.—TRANSFER EFFECTS OF THE How-To-stTuDY COURSE 


LEVEL OF 
N NET ‘t’ CONFIDENCE, 
GROUPS Parrs Gatn SEwne Ratio’ Per Cent 
History 
Sup. Exp.-Con........... 48 5.74 2.11 2.72 1 
Aver. Exp.-Con.......... 41 —2.12 2.37 — .90 40 
Latin 
Sup. Exp.-Con........... 51 .25 .95 . 26 80 
Aver. Exp.-Con.......... 45 1.59 1.08 1.47 20 
Algebra 
Sup. Exp.-Con........... 49 17° 1.06 .16 90 
Aver. Exp.-Con.......... 41 — .386 1.45 — .25 90 


TRANSFER EFFECTS UPON QUARTER GROUPS 


In the above analysis the groups were one hundred two and 
ninety in number. It was decided to split each of these larger 
groups in half and note whether the results of the matched 
groups differentiated into quarters on the basis of mental ability 
would yield more exact interpretations of the effects of the 
how-to-study class. The quantitative descriptions of this 
quartile classification and the comparability of the matching have 
been presented in Table III. Considering both control and 
experimental groups together, the highest quarter had an average 
IQ of 116.8, which means that on the average they are at the 
84th percentile in intelligence (’, p. 302); the second quarter had 











Transfer Effects of a How-to-study Course 171 


an average intelligence of 108.8 which is at the 69th percentile; 
the third quartile had an average IQ of 101, at about the 50th 
percentile; and the lowest quarter had an average of 94.2 which 
is found at the 36th percentile of a normal distribution of IQ 


scores. 


TABLE III.—CoMPARABILITY AND DESCRIPTION OF THE MATCHED 
Groups DIVIDED INTO QUARTERS ACCORDING TO LEVELS OF 


INTELLIGENCE 
MEAN MEAN AGE, 
GROUP N IQ SD Yrs.—Mos. SD 
Highest Quarter 
Experimental........ 23 117.4 5.86 14— 0 5.75 
GIES sos wedveevuc 23 116.2 5.47 14— 0 5.88 
Second Quarter 
Experimental........ 28 108.9 2.98 14— 2 5.86 
a 5k yo 28 108.7 2.70 14— 3 6.52 
Third Quarter 
Experimental........ 22 100.5 2.60 14— 4 7.92 
I ou waaes enw’ 22 101.5 2.33 14— 3 6.30 
Lowest Quarter 
Experimental........ 23 93.6 3.56 14—10 8.30 
RE eee 23 94.7 4.18 14—10 8.96 


The results in Table IV definitely reveal that the previous data 
can be satisfactorily refined. In the subject of history it was 
found that the second-quarter group made the greater contri- 
bution to the observed significantly improved achievement of 
the total Superior group. The ‘t’ ratio of 2.49 obtained from the 
differences between the means of the experimental over the 
matched control group is beyond the two-per-cent-level of 
confidence. This may be accepted as a practically reliable 
difference between the observed mean scores. 

In the subject of Latin, the division of the two large groups 
into quarters manifests another interesting trend. The third 
quarter now shows a statistically significant gain although in the 
analysis of the total Average group there was only a statistically 
unreliable gain for the experimental students. This quarter, 
whose mean IQ was 101, has proved its capacity to achieve a 
definite positive transfer from the how-to-study course to the 
Latin test. On the other hand, the first, second, and lowest 


— 











172 The Journal of Educational Psychology 


quarters show no differences worthy of notice because they are so 
unreliable. 

In algebra the special course does not seem to have affected 
achievement scores of the pupils even when the means of the 
quarter groups are examined. The very small difference 
between the large Average control and experimental groups, as 
seen in the previous section, turns into opposite directions when 
the Average group is divided in half on the basis of mental 
ability. The change tends to be favorable to the third-quarter 
students who were enrolled in the how-to-study class, and 
unfavorable to the experimental students of the lowest quarter. 
These latter differences have to be interpreted cautiously, since 
neither of the results approaches statistical significance. 


TaBLE I1V.—TRANSFER EFFECTS OF THE HOW-TO-STUDY COURSE 
UPON DIFFERENT LEVELS OF INTELLIGENCE IN HIsToRY, 
LATIN, AND ALGEBRA 


ACADEMIC IQ Mean GAIN =e LEVEL OF 
Supsect LeveL Exp.—Con. SEmue Ratio CONFIDENCE 
History..... 116.8 3.20 3.02 1.06 30% 
108.8 7.41 2.97 2.49 2% 
101.0 — .96 3.74 — .26 80% 
94.2 —4.25 3.01 —1.41 20% 
Latim...... 116.8 — .48 1.56 — .3l 80% 
108.8 .86 1.15 .75 50% 
101.0 3.73 1.27 2.94 1% 
94.2 — .47 1.61 — .29 80% 
Algebra... .. 116.8 — .20 1.40 — .14 90% 
108.8 44 1.55 28 80% 
101.0 2.55 1.72 1,48 20% 
94.2 —3.74 2.26 —1.66 20% 


If the trends shown in the ‘t’ ratios of the various quarter 
groups are inspected in Table IV, there is observable a tendency 
for the middle experimental groups, with mean IQ’s of 108.8 and 
101, to manifest the most benefit from the how-to-study course. 
The comparisons in the three academic subjects demonstrate a 
positive gain for the middle experimental groups in five out of 
six comparisons (one being statistically reliable at the one-per- 
cent-level of confidence, and the other at the two-per-cent-level) ; 











Transfer Effects of a How-to-study Course 173 


whereas for the highest and lowest groups, it is the control stu- 
dents who maintain a very slight superiority in five out of six 
comparisons (but none of these even approached statistical 


significance). 


CONCLUSIONS 


In subsequent paragraphs, the results of this experiment will 
not be considered solely by themselves; rather, the interpre- 
tations of the present data will be examined in the light of the 
more important results obtained by other investigators who have 
reported on the effectiveness of a how-to-study course. 

(1) The data of this experiment have led the investigator to 
conclude that the course, as taught in this study, has proved its 
value for the middle groups of mental ability. One of these 
middle groups benefited significantly in history and the other 
made a significant, positive transfer in Latin achievement. 
Moreover, the middle quartile groups made a statistically 
significant gain in knowledge of study skills as was described in 
the first report.‘ Of the other investigators in this field, Pressey, 
Jones,® Bird,' Wagner and Strabel,’’ Book,? Moore,'* Gatchel,® 
and Crawford,* agree with this investigator on the value of the 
course. Mills'* absolutely disagrees. Turrell'® maintains that 
such a course was unnecessary in the junior college where he 
conducted his investigation. However, he found that the 
middle-third group in intelligence who took the one-unit how-to- 
study course made a superior showing in seventy-one per cent of 
the comparisons with the control group. Winter" believes in its 
short-term but not its long-term effects. 

(2) The present study lends weight to the conclusion formu- 
lated by Pressey® and also by Bird! that students of very poor 
ability, that is, below the first quartile in intelligence, do not 
profit noticeably from the course. 

(3) The how-to-study course has not demonstrated its value 
objectively for the students in the highest quarter of mental 
ability. These very superior students did make a significant 
gain in the knowledge of proper study skills as a result of the 
course, but such additional knowledge did not noticeably 
transfer to other subjects in the curriculum to any appreciable 
degree. Only in the subject of history did the very superior 
experimental pupils show any advantage, but this gain in 











’ 
: 
- 
; 
' 








174 The Journal of Educational Psychology 


achievement was not by itself statistically reliable. The data of 
the present experiment partially contradict the conclusion 
formulated by Bird! which states that the course will benefit the 
student in direct proportion to his ability level provided it pre- 
cedes academic defeat. Such a conclusion seems to be only 
partly true. 

(4) Though the how-to-study course will improve the achieve- 
ment of some groups, the improvement will not be enough to 
enable how-to-study pupils of a lower quarter group to compete 
on an equal plane with students in the next higher quarter in 
intelligence who have not been given the special instruction. In 
other words, the instruction in best methods of study may result 
in greater returns in proportion to one’s ability, but it will not 
materially compensate for lack of mental ability. This is stated 
as a general rule; individual exceptions have been noted in most 
experimental reports. In this experiment the various quarter 
intelligence groups maintained without exception a corresponding 
rank in achievement within the different academic fields. This 
conclusion agrees with studies by Eckert and Jones and Bird.' 

(5) The special instruction in how-to-study transferred in 
varying amounts to the different academic subjects of history, 
Latin, and algebra. Some differences observed between the 
experimental and the control groups were positive, others 
slightly negative, others were insignificant. It is possible, then, 
that the gains in any one subject may be averaged out by being 
combined with scores in other subjects in which there may appear 
no gain, or even a slight loss. For this reason, the transfer 
effects of the how-to-study course should be judged with refer- 
ence to a particular subject, and not with reference to the total 
average grade of the student in all subjects together. 

(6) The transfer effects upon the different academic subjects 
have not been similar from study to study. Jones'® is probably 
correct in implying that the different results are due to the 
dissimilar materials employed in the how-to-study courses. 

(7) It was observed that the instruction in how-to-study 
transferred in varying amounts upon the achievement of pupils 
on different levels of intelligence. Because of this fact, the 
effects of the course upon pupils of a certain level of intelligence 
may be obscured by being averaged out in the achievement 
scores of different levels of intelligence. Therefore, the transfer 








Transfer Effects of a How-to-study Course 175 


effects must be determined with reference to the level of intelli- 
gence of the students to whom the course is given. 


BIBLIOGRAPY 

. Charles Bird: Effective Study Habits. New York: The Century Co., 
1931. 

2. W. F. Book: “Results Obtained in a Special ‘How-to-Study’ Course 
Given to College Students.”’ School and Society, Vol. xxvi, pp. 
529-534, 1927. 

3. C. C. Crawford: ‘‘Some Results of Teaching College Students How to 
Study.”’ School and Society, Vol. xx111, pp. 471-472, 1926. 

4. S. G. DiMichael: “Increase in Knowledge of How to Study Resulting 
from a How-to-Study Course,” School Review, Vol. L1, pp. 353-359, 
1943. 

. R. E. Eckert and E. 8. Jones: ‘‘Long Time Effects of Training College 
Students How to Study.” School and Society, Vol. xi, pp. 685-688, 
1935. 

6. Mordecai Ezekiel: ‘“‘Student’s Method for Measuring the Significance 
of a Difference Between Matched Groups.” Journal of Educational 
Psychology, Vol. xxi, pp. 446-450, 1932. 

. Frank N. Freeman: Mental Tests. Boston: Houghton-Mifflin Co., 1939. 

8. D. F. Gatchel: “Results of a How-to-Study Course Given in High 
School.”’ School Review, Vol. xxx1x, pp. 123-129, 1931. 

9. E. S. Jones: ‘‘ Testing and Training the Inferior or Doubtful Freshman.” 
Personnel Journal, Vol. v1, pp. 182-191, 1927-1928. 

10. E. 8. Jones: ‘‘The Preliminary Course on ‘ How to Study’ for Freshmen 
Entering College.’”’ School and Society, Vol. xxrx, pp. 702-705, 1929. 

11. S. R. Laycock and D. H. Russell: ‘‘An Analysis of Thirty-eight How- 
to-Study Manuals,” School Review, Vol. xix, pp. 370-379, 1941. 

12. E. F. Lindquist: Statistical Analysis in Educational Research. Boston: 
Houghton-Mifflin Co., 1940. 

13. H. C. Mills: “‘How to Study Courses and Academic Achievement.” 
Educational Administration and Supervision, Vol. xx, pp. 619-624, 
1934. 

14. H. Moore: “Training College Freshmen to Read.” Journal of Applied 
Psychology, Vol. xvim, pp. 631-634, 1934. 

15. L. C. Pressey: ‘‘The Permanent Effects of Training in Methods of 
Study on College Success.” School and Society, Vol. xxvii, pp. 
403-404, 1928. 

16. A. M. Turrell: “Study Methods and Scholarship Improvement.” 
Junior College Journal, Vol. vu, pp. 295-301, 1937. 

17. M. E. Wagner and E. Strabel: ‘‘Teaching High-school Pupils How to 
Study.” School Review, Vol. xu, pp. 577-589, 1935. 

18. Guy M. Whipple: “‘ Experiments in Teaching Students How to Study.” 
Journal of Educational Research, Vol. 1, pp. 1-11, 1929. 

19. J. E. Winter: ‘‘An Experimental Study of the Effect on Learning of 
Supervised and Unsupervised Study Among College Freshmen.” 
Journal of Educational Psychology, Vol. xxv1i, pp. 111-118, 1936. 


_ 


ou 


~] 








9 agen 5 


eee ag +5 





SSS ee . “ 


9 oe 





THE RELATIVE IMPORTANCE OF SUCCESS AND 
FAILURE IN LEARNING, AS RELATED TO 
CERTAIN INDIVIDUAL DIFFERENCES 


J. W. TILTON 


Yale University 


In the learning exercises which provided the data for this study, 
forty subjects chose one of four response syllables following a 
stimulus syllable. According to a prearranged “chance”’ pat- 
tern, the experimenter told the subject when he was right and 
when wrong. In order that a “measure” of the influence of 
“right” and “wrong” might be obtained, a preliminary exercise 
had been planned in which the subjects repeatedly responded to 
the same or similar choice situations without the informative 
“right”? and “‘wrong”’ from the experimenter. This exercise 
gave a basal or zero amount of repetition from which to measure. 
To have assumed a chance amount of repetition as a base would 
have introduced a serious error. It would have led to the con- 
clusion that ‘“wrong”’ interfered with learning. As compared 
with the experimentally obtained base, “‘wrong”’ reduced repeti- 
tion slightly more than “right”’ increased it.! 

Although an assumed chance amount of repetition would have 
provided much too low a base, the base used was slightly high. 
By the mere introduction into the exercise of the “‘right’’ and 
“‘wrong’’ announcements, and not because of their differential 
influence, repetition was reduced. This general reduction in 
repetition made “right” seem less effective than it was, and made 
“‘wrong’’ appear more effective. Corrected for this error in the 
base, “right’’ and ‘‘wrong”’ were found to be about equally 
effective. 

This result has been considered contrary to a hypothesis which 
has been advanced with supporting data.? The hypothesis is 





1A full report was made under the title, “‘The Effect of ‘Right’ and 
‘Wrong’ Upon the Learning of Nonsense Syllables in Multiple Choice 
Arrangement.” Journal of Educational Psychology, vol. xxx, 1939, pp. 
95-115. 

2 Stephens, J. M.: ‘‘Some Anomalous Results of Punishment in Learning: 
A Preliminary Note.’”’ School and Society, vol. t11, December 28, 1940, pp. 
703-704. 





, “The Influence of Symbolic Punishment and Reward Upon 


Strong and Upon Weak Associations.”” Journal of General Psychology, 
vol. xxv, 1941, pp. 177-185. 


176 











Importance of Success and Failure in Learning 177 


that upon weak associations reward has great influence and 
punishment little; upon strong associations, punishment is very 
effective, reward only slightly so. Weak associations are 
thought of as of chance frequency. According to this hypothesis, 
and starting with associations between meaningless syllables, one 
would expect to find “‘right’’ much more effective than ‘‘ wrong.” 

But perhaps the hypothesis needs modification. An analysis 
of the data from the learning of the syllables showed' that the 
spurious correlation of initial scores with gains? may have been 
responsible for the supposed validation of the hypothesis. The 
analysis referred to, only covered the range of association 
strength from chance, half way up to one hundred per cent 
repetition. But within the range it did cover, a positive correla- 
tion was found, not only between the initial or zero amount of 
repetition and the influence of “wrong,” but also, though less 
reliably, between that initial strength and the effectiveness of 
“right.” 


TaBLE I.—TuHE CoNnTRIBUTIONS OF SuCCESS AND FAILURE TO 
THE LEARNING OF THREE GROUPS DIFFERING IN THE 
AMOUNT OF INITIAL REPETITION 
(Uncorrected for Regression) 


AMOUNT OF AVERAGE AVERAGE 
INITIAL REPE- INCREASE DECREASE 
NUMBER OF TITION IN EFFECTED BY EFFECTED BY 
SUBJECTS Per CENT “RIGHT,” “WRONG” 
11 50 or above 4.9 23.9 
14 40-49 aan 11.2 
15 below 40 10.0 5.7 


Another analysis remains to be reported. How is the effec- 
tiveness of “‘right”’ related to individual differences in the amount 
of zero or basal repetition? Similarly, how is the influence of 
“wrong’’ related to these individual differences? 





1 Tilton, J. W., ‘Effect as Determined by Initial Strength of Response.”’ 
An article to appear in the Journal of General Psychology. 

? McNemar, Q., “‘A Critical Examination of the University of lowa 
Studies of Environmental Influences Upon the IQ.”’ Psychological Bulletin, 
vol. xxxvu1, 1940, pp. 63-92, especially pp. 84-90. 

Zieve, L., ‘‘Note on the Correlation of Initial Scores With Gain.” 
Journal of Educational Psychology, vol. xxx1, 1940, pp. 391-394. 


— 














178 The Journal of Educational Psychology 


The results are reported, first, uncorrected for the spurious 
correlation between initial measures and changes in those 
measures. In the case of “right’’ the correlation is, —.19 
between the initial amount of repetition and increases. For 
“wrong,” the correlation is +.79 between the amount of initial 
repetition and the size of the decrease.' In averages, these 
results are shown in Table I. 

For “true” measures, the corrected correlation between initial 
repetition and increases effected by “right” is +.21. For 
decreases effected by “‘wrong”’ it is +.99.2 Averages for “true”’ 
measures are shown in Table II. 


TaBLeE IJ.—TuHE ContTRIBUTIONS OF SuCCESS AND FAILURE TO 
THE LEARNING OF THREE Groups DIFFERING IN THE 
AMOUNT OF INITIAL REPETITION 
(Corrected for Regression) 


AMOUNT OF AVERAGE AVERAGE 
INITIAL INCREASE DECREASE 
REPETITION IN EFFECTED BY EFFECTED BY 
Per CENT “RicaT”’ ““WrRONG”’ 
NUMBER OF (“True” ¥FroM‘“ TRUE” FROM “TRUE” 
SuBJECTS MEASURES) INITIAL INITIAL 
7 50 or above 9.1 22.3 
18 40-49 7.6 12.2 
15 below 40 7.5 8.1 


The variance between the groups in the effect of ‘‘wrong”’ is 
significantly greater than the variance within them.* The dif- 





1The correlation is minus in both cases between the initial score and 
changes in it. But, since the contribution to learning is the function being 
evaluated, decreases effected by ‘‘wrong”’ are treated as plus. 

2 The formula corrects for uncorrelated as well as correlated errors. The 
resultant may be, as in this case, more of a correction for attenuation than 
for correlated errors. See, especially pp. 323-324, Thomson, G. H.: “A 
Formula to Correct for the Effect of Errors of Measurement on the Corre- 
lation of Initial Values with Gains.”’ Journal of Experimental Psychology, 
vol. v1, 1924, pp. 321-324. How this operates may be shown objectively 
by correcting the correlations in two stages. If all initia] scores are regressed 
into ‘“‘true” scores before the changes are computed, the r’s are +.11 and 
+.67. So far, the r for “right” is raised and the r for ‘““wrong”’ lowered. 
But then correction for attenuation makes them +.18 and +1.04. 

3 F is greater than required for the one per cent test in the Snedecor Table, 
pp. 180-183 in Davenport, C. B. and Ekas, M. P.: Statistical Methods in 





Importance of Success and Failure in Learning 179 


ferences are not significant for “right.’”’ These results are very 
similar to those reported when groups of responses of differing 
initial strength were studied. 

Neither from that analysis nor the present one, are data exten- 
sive enough to give certainty to the positive correlations between 
initial amount of repetition and the effect of “right’’ upon that 
repetition. A great deal of data went into the measures for the 
forty subjects studied in the present article, but a low correlation 
is hard to prove. The reliabilities are .69 for initial repetition, 
.59 and .35, respectively, for the effect of “right”’ and the effect 
of “wrong.” If the correlation for “right” is in general about 
as reported, it would take data from four hundred subjects to 
prove the relationship beyond the sample studied. The follow- 
ing conclusions are therefore drawn only tentatively and chiefly 
for their significance in relation to the hypothesis referred to 
earlier in this article. 

For these data, it was upon the strongest associations, and for 
persons whose initial repetition was highest, that “wrong’’ was 
more effective than “right.” This is in agreement with the 
hypothesis referred to. But upon the responses repeated only 
slightly above a chance amount, and for persons who were 
initially repeating only slightly above a chance amount, “right” 
and ‘‘wrong’”’ were about equally effective. This equality does 
not agree with the hypothesis. To fit these data, the hypothesis 
should predict that it is in situations where responses are quite 
varied, and with individuals whose responses are quite varied 
that “right”? and “wrong”? may be expected to be equally 
helpful in learning. 


REFERENCES 


C. B. Davenport and M. P. Ekas: Statistical Methods in Biology, Medicine 
and Psychology. New York: John Wiley and Sons, Inc., 1936. 

Q. McNemar: “A Critical Examination of the University of Iowa Studies of 
Environmental Influences Upon the IQ.” Psychological Bulletin, 
vol. xxxvi1, 1940, pp. 63-92, especially pp. 84-90. 

J. M. Stephens: ‘‘Some Anomalous Results of Punishment in Learning: A 
Preliminary Note.’”’ School and Society, vol. 11, December 28, 1940, 
pp. 703-704. 





Biology, Medicine and Psychology. New York: John Wiley and Sons, Inc., 
1936. 











j 
‘ 


180 The Journal of Educational Psychology 


: ‘The Influence of Symbolic Punishment and Reward upon Strong 
and upon Weak Associations.” Journal of General Psychology, vol. 
xxv, 1941, pp. 177-185. 

G. H. Thomson: “ A Formula to Correct for the Effect of Errors of Measure- 
ment on the Correlation of Initial Values with Gains.” Journal of 
Experimental Psychology, vol. v11, 1924, pp. 321-324. 

J. W. Tilton: ‘The Effect of ‘Right’ and ‘Wrong’ upon the Learning of 
Nonsense Syllables in Multiple Choice Arrangement.” Journal of 
Educational Psychology, vol. xxx, 1939, pp. 95-115. 

: “Effect, as Determined by Initial Strength of Response,’’ to appear 
in the Journal of General Psychology. 

L. Zieve: ‘‘ Note on the Correlation of Initial Scores with Gain.” Journal 


of Educational Psychology, vol. xxx1, 1940, pp. 391-394. 





Pi ks SERNA ELE, oan dae 


DIFFICULTIES ENCOUNTERED IN MEASURING 
CHANGES IN INTELLIGENCE BY THE TEST- 
RETEST PROCEDURE 


H. G. JOHNSON 


University of Minnesota 


The research worker who desires to make inferences about 
changes in intelligence on the basis of tests administered within 
short intervals faces problems that are apart from, and in addition 
to, the problems encountered in measuring intelligence itself. 
A common practice consists of giving an intelligence test before 
an experiment and, after an interval often lasting less than one 
year, of administering another test to determine what changes in 
IQ have occurred due to certain conditions. It is the purpose of 
this paper to call attention to certain special difficulties faced by 
the investigator when he uses such a procedure. 

Technical problems and hazards in measuring intelligence, 
as well as changes in intelligence, have been competently dis- 
cussed by Goodenought and McNemar® and will not be com- 
mented on here. The main problem taken up in this paper 
arises out of the theoretical background of mental testing. A 
disregard of the techniques of mental testing often leads to 
incorrect test results; in like manner, a disregard of the theory of 
mental testing may sometimes lead to an improper interpretation 
of test results. 

We have not been able to measure intelligence directly, so 
psychologists have sought to get estimates of it through the 
indirect method of measuring skills and knowledges that every- 
one has had about the same opportunity to acquire. It seems 
that performance on mental tests is determined, in some degree, 
by the total skills and knowledges that a child has been able to 
accumulate since birth. If this is so, then such increments as a 
child may be able to add to these skills and knowledges in rela- 
tively short periods, such as six months or a year, will likely be 
rather small compared to the total that he has acquired since he 
was born. This will be especially true for older children. As 
long as we measure intelligence indirectly, we must give an 
improved intelligence time to function before we can make any 
estimate of the extent of the improvement. In other words, if 
we have reason to believe that the intelligence of a child has 

181 

















182 The Journal of Educational Psychology 


been improved, we must give this child time to make use of this 
improved intelligence in the acquisition of the skills and knowl- 
edges by which we estimate his mental status before we can 
determine the extent of the improvement. 

To illustrate this point, let us consider the problem of deter- 
mining the gain in IQ that may result from physical betterment, 
such as the removal of adenoids and diseased tonsils or the 
improved health resulting from a more balanced diet. Suppose 
we wish to establish the effects of the removal of adenoids upon 
the intelligence of a group of children by the test-retest procedure. 

For convenience of computation, let us take the instance of a 
child nine years old at the beginning of the experimental period. 
He is given a mental test and his IQ is found to be 100. His 
adenoids are removed and a year later he is given another test to 
determine whether or not an improvement in intelligence has 
resulted. Suppose an improvement of ten per cent actually did 
occur. How would this improvement affect his IQ? 

Since this child had an IQ of 100 before the operation, his gain 
in mental age amounted to one year during each chronological 
year. After the operation, due to a ten-per-cent improvement in 
mental ability, his gain in mental age amounted to 1.1 years for 
each chronological year. One year after his adenoids were 
removed, this child would then have a chronological age of 10 
years, a mental age of 10.1 years, and an IQ of 101—this compares 
with an IQ of 100 before the operation. A gain of one point in 
IQ is usually considered insignificant. If our line of reasoning is 
correct, a child of nine, due to some physical betterment, may 
have a ten-per-cent improvement in mental ability which we 
cannot detect by retesting within one year. 

It should be understood that the writer is not trying to prove 
that any gain in IQ will result from an improvement in a child’s 
physical condition. The purpose here is merely to point out 
the difficulty of measuring changes in intelligence by retesting 
within short intervals of time. 

Again, let us consider the problem of attempting to determine 
the effects of improved reading ability on the IQ by means of 
the test-retest procedure. Suppose a group of children is given 
remedial work in reading with the result that in one year they 
gain half again as much as they ordinarily would have gained. 
Let us assume that these children were average in their reading 





Difficulties Encountered by Test-retest Procedure 183 


ability and that in one year they raised their reading ages eighteen 
months instead of the usual twelve months. What effect will 
this extra gain of six months in reading age have upon the IQ? 

Experiments on transfer of training indicate that transfers 
of one hundred per cent are rare and that in most cases the 
amount of transfer is twenty per cent or less. Let us assume 
that in our case there was a transfer of twenty per cent of the 
gain in performance on the reading test to performance on the 
intelligence test. The extra gain of 6 months in reading age will 
thus result in an extra gain of 1.2 months in mental age. 

A child with a chronological age of 9 and a mental age of 9 at 
the beginning of the experiment will have a chronological age of 
10 and a mental age of 10 years, 1.2 months at the end of the 
experiment. His IQ before beginning remedial work will be 
100 compared to an IQ of 101 at the end. Such a gain is not 
considered significant and cannot be measured accurately by our 
present mental tests. 

If the above line of reasoning is correct, then there may be, in 
situations similar to that just outlined, a twenty-per-cent transfer 
from reading ability to mental test performance which cannot 
be detected by the test-retest procedure. 

In an experiment of this nature, we must consider the possi- 
bility that achievement in other school subjects that correlate 
positively with the IQ may also affect mental test performance, 
and records should be kept in these subjects to see that normal 
progress is made during the experimental period. There is a 
possibility that when a great deal of emphasis is placed on reading 
improvement, progress in other subjects may be slower than 
normal. 

Again, let it be mentioned that the purpose here is not to 
furnish evidence that reading ability has any influence upon the 
IQ. These hypothetical cases have been cited for the sole pur- 
pose of illustrating the difficulty of measuring changes in intelli- 
gence by retesting within short intervals. 

While on this matter of explanation,—if the reader will pardon 
a slight digression—the writer would also like to make clear 
what he means by such phrases as ‘changes in mental ability’ 
and ‘a ten-per-cent improvement in intelligence.’ It is con- 
ceivable that a child will make better use of whatever mental 
ability he possesses when he is in good health than when he is 














184 The Journal of Educational Psychology 


ailing; consequently any gain in IQ that may result from an 
improvement in health, due to a better diet or other factors, 
may be caused by the fact that a child makes better use of his 
innate capacity rather than by the fact that an actual improve- 
ment in innate mental capacity takes place. In like manner, 
it is quite possible that a child in a nursery school or in a good 
home has greater opportunity to make use of his innate mental 
capacity in the acquisition of the skills and knowledges measured 
by intelligence tests than has a child in an orphanage or in an 
inferior home. The reader, therefore, may at his option interpret 
the phrase, ‘improvement in intelligence,’ to mean any one or all 
of the following: (a) an actual improvement in a variable intelli- 
gence, (b) an improvement in the use of a constant intelligence, 
and/or (c) an improvement in the opportunity to make use of a 
constant intelligence. 

It is evident that, due to our method of computation, it is 
much easier to change the IQ’s of younger than of older children. 
In equal periods of time, with equal changes in mental ages, 
the IQ of a five-year-old child will change nearly twice as much 
as the IQ of a ten-year-old child. Similarly, the change in IQ 
for an experiment that lasts two years will be nearly twice as 
great, other things being equal, as the change in IQ for an experi- 
ment that lasts only one year. 

If the analysis presented in this paper of the difficulty of detect- 
ing changes in intelligence by retesting within comparatively 
short intervals is correct, then it would seem that we cannot 
measure such important changes as a ten-per-cent improvement 
in intelligence and a twenty-per-cent transfer from reading 
ability to mental test performance when the interval between 
test and retest is less than one year. In the hypothetical illus- 
trations given above in which gains amounted to only 1 IQ point 
in one year, the largest gain we can hope for in four years is only 
about 4 IQ points. By that time so many other factors may 
have influenced the IQ that one hesitates to single out any par- 
ticular factor to which to ascribe the cause of the change. 


REFERENCES 


(1) E. R. Balken, and S. Mauer: “Variations in psychological measure- 
ments associated with vitamin B complex feeding in young children.”’ 
J. Exp. Psychol., Vol. xvur, 1934, pp. 85-92. 








Difficulties Encountered by Test-retest Procedure 185 


(2) F. N. Freeman: Mental Tests. Boston: Houghton Mifflin Co., 1939. 

(3) M. F. Fritz: “The effect of diet on intelligence and learning.” Psy. 
Bull., Vol. xxxu, 1935, pp. 355-363. 

(4) F. L.Goodenough: “Some special problems of nature-nurture research.” 
Thirty-ninth Yearbook N.S.S.E., 1940, 367-384. 

(5) J. W. Hawthorne: “‘ The effect of improvement in reading on intelligence 
test scores.” J. Ed. Psy., Vol. xxvi, 1935, pp. 41-51. 

(6) E. Lowry: “Increasing the IQ.”’ Sch. and Soc., Vol. xxxv, 1932, pp. 
179-180. 

(7) Janet Matthew and Bertha Luckey: ‘‘ Notes on factors that may alter 
the intelligence quotient in successive examinations.” T'wenty- 
seventh Yearbook N.S.S.E., Part I, 1928, 411-419. 

(8) Quinn, McNemar: “A critical examination of the University of Iowa 
studies of environmental influences upon the IQ.” Psy. Bull., Vol. 
xxxvul, 1940, pp. 63-92. 

(9) Margaret A. Mellone: “ An investigation into the relationship between 
reading ability and the IQ as measured by a verbal group intelligence 
test.” Br. J. Ed. Psy., Vol. xu, 1942, pp. 128-135. 

(10) P. Orata: ‘‘ Recent research studies on transfer of training with implica- 
tions for the curriculum, guidance, and personnel work.” J. Ed. 
Res., Vol. xxxv, 1941, pp. 81-101. 

(11) L. E. Poull: “The effect of improvement in nutrition on the mental 
capacity of young children.” Child Develpm., Vol. rx, 1938, pp. 
123-126. 

(12) A. Richey: ‘‘The effects of diseased tonsils and adenoids on intelligence 
quotients of two hundred four children.” J. Juv. Res., Vol. xvi, 
1934, pp. 1-4. 

(13) R. L. Thorndike: ‘The ‘constancy’ of the IQ.” Psychol Bull., Vol. 
xxxvul, 1940, pp. 167-185. 

(14) E. L. Thorndike: ‘“‘ Mental discipline in high-school studies.” J. Ed. 
Psy., Vol. xv, 1924, pp. 1-22, 83-98. 














BOOK REVIEWS 


GreorcGe D. Stropparp. The Meaning of Intelligence. New 
York: The Macmillan Company, 1943, pp. 504. 


It is unfortunate for the advancement of science when differ- 
ences in point of view or in methods of interpretation lead to 
heated controversy rather than to coéperative effort. The much 
publicized reports of the Iowa investigators and their students, 
with their extravagant claims of the extent to which mental 
exercise brought about by a stimulating environment can raise 
intellectual ability, exemplify the point in a striking way. 

Burks' in two analyses which are admirable both for their 
accuracy and their tolerant attitude, has made a laudable sugges- 
tion to the effect that the controversy over the Iowa reports may 
well have its termination in an attempt to appraise both sides of 
the case, with a view to clearing up reasons for differences in 
interpretation, with admission of known errors, thus paving the 
way for codéperative effort. The project of salvaging what may 
be of the extensive data of the Iowa studies by the use of sound 
methods of interpretation, is suggested. Yet Burks cautions, for 
example, with reference to the Iowa reports by Skodak, that 
many of Dr. Skodak’s inferences are altogether unjustified by the 
data, and that she offers ‘‘interpretations, conclusions and social 
implications whose references cannot be found either in her own 
material or in any other with which the reviewer is familiar.” 
If such a salvage project is the hope, there is little encouragement 
to be found in this recent broadcasting of the original Iowa inter- 
pretations, which constitute the essence of Stoddard’s book on 
The Meaning of Intelligence. 

Doubtless the large majority of educational psychologists 
would agree that the scientific movement in education of the 
present century has made a very material and lasting contribu- 
tion to the interpretation of intellectual ability, and to the 
improvement of learning adjustments in the schools of the 
nation. Admission that the scientific mode of approach, with 
its emphasis upon the use of objective tools of measurement, 
needs supplementing and checking by the use of other methods of 





1 Burks, B. S.: Jour. Educ. Psychol., Oct. 1939, pp. 548-555; Jour. Abnor. 
and Social Psychol., July, 1940, pp. 457-462. 
186 








Book Reviews 187 


appraisal and approach, is by no means to imply that the use of 
the scientific method is to be discredited rather than further 
developed. Judged from the standpoint of its contribution to 
the cumulative store of sound insight to this area, Stoddard’s 
book is much more likely to add fog and detriment to the prob- 
lem than illumination. The general attitude is that of casual 
attack upon the best that has been built up over an extended 
period of years in intelligence testing. It would seem that there 
are some educational theorists who would like to be as unhampered 
in their educational interpretations as the child was expected to be 
in the ‘child-centered school’ of a decade or more ago. Adher- 
ence to sound use of tests and to sound statistical methods of 
interpretation is apparently found to be too restricting. 

The method used by Stoddard with a view to securing the 
desired freedom is two fold; first, that of flooding the field with 
extensive quotations from the Iowa studies, as though the erro- 
neous interpretations contained in them had never been pointed 
out at all; and, second, that of casting casual discredit upon the 
use of the most carefully constructed and authentic tools of meas- 
urement available. For example, McNemar,! whose analysis of 
the startling fallacies involved in the Iowa reports, has been the 
most extensive and adequate of all of the many critics, partly 
because he had the advantage of access to some of the original 
Iowa data, is not mentioned at all. Likewise, the wealth of 
investigations which refute the Iowa conclusions is hardly men- 
tioned at all, or is erroneously interpreted. ‘The conclusions of 
the Iowa studies are cited under conditions which make critical 
consideration by the reader of the validity of their interpretations 
practically impossible. In this way sanction is given to the use 
of methods of research in this difficult area which ignore many 
of the most important safeguards of valid inference of intellectual 
ability. Infant tests are accepted as valid though the careful 
work of Bayley,? Anderson’ and others has shown that they are 
not valid for the purposes for which the lowa investigators use 
them. It is little wonder that the Iowa investigators find reason 


1 McNemar, Q.: Psychol. Bull., Feb. 1940, pp. 63-92. (Note also the 
reply to McNemar’s criticisms in the same issue.) 

? Bayley, N.: Thirty-ninth Yearbook, National Society for the Study of 
Education, Part II, 1940, p. 12. 

? Anderson, J. E.: Jour. Psychol., Vol. vit, 1939, pp. 351-379. 





-« ~ ¥ =% 











188 The Journal of Educational Psychology 


for discounting the value of the IQ when they so flagrantly dis- 
regard the factors which have been laboriously built up to safe- 
guard its claim to validity and usefulness. 

Nor does Stoddard qualify as a positive contributor in his 
attempt to point out unrealized limitations of intelligence tests 
when used as prescribed. The medical man’s instruments need 
not be discredited just because there are a few doctors who 
misuse them, or who misinterpret the data they supply, in their 
attempts at diagnosis and prediction. Better discard the unre- 
liable doctors. The leaders in extensive and adequate experi- 
mental investigation, as exemplified by Terman, Thorndike and 
others, have been foremost in uncovering important limitations 
of the tools of intellectual measurement developed to date, with a 
view to further improvement of them and of their legitimate use. 
This has been followed by revision and extension of the tests as 
indicated. But Stoddard’s attack is characterized by extensive 
quotations from critics whose opinions are not based upon 
adequate experimental data, or upon recognized expertness in 
this specialized field. 

One who is interested in adequacy of statistical and psycho- 
logical analysis of classical studies of the nature-nurture problem, 
would do well to compare Stoddard’s casual critical comment 
upon such well-known investigations as those by Burks and by 
Leahy, with the masterly and detailed analysis of the same 
studies by Shuttleworth. The contrast is astonishing and 
most illuminating. The nature-nurture problem is one which 
calls for the use of data suited to its purpose, and often 
for highly technical statistical analysis. These safeguards are 
conspicuously lacking in the data utilized so extensively and so 
ill-advisedly by Stoddard. 

A crucial problem which should be dealt with in the field 
covered by Stoddard is the explanation of ‘why’ mental exercise 
as stimulated in a good environment increases the general intel- 
lectual ability, in terms of established psychological principles. 
No definite analysis of the nature of the mental exercise brought 
about, of what is gained by the exercise, or of the reason for the 
alleged breadth and startling significance of the transfer of 





1Shuttelworth, F. K.: Jour. Educ. Psychol., Vol. xxv1, 1935, pp. 561- 
578; 655-681. 











T 


es == = CD wW 





Book Reviews 189 


the improvement brought about, is to be found, other than the 
vaguest generalities. The interpretations are altogether out of 
line with the extensive work which has been reported upon the 
transfer of training. 

There is little doubt that, in the long run, the elusive fallacies 
which are hidden from the view of the casual and uncritical 
reader, will be revealed. Yet it is well to remember that it took 
at least fifteen years for the fallacies in the experiments of John 
B. Watson upon the significance of early environment, to come 
to light in all their superficiality. There are significant simi- 
larities in the appeal of Stoddard and that of Watson. Both 
brought to bear the weight of professional prestige. Both write 
in readable style with a popular appeal which capitalized upon 
popular prejudice. Both made interpretations altogether unwar- 
ranted by the experimental evidence. A typical illustration of 
such unwarranted interpretation appears in Stoddard’s admis- 
sion: “It is painfully clear that some children do not improve, and 
that other children, under apparently excellent environmental 
conditions, may continue to deteriorate.’’ This admission is in 
accordance with the facts, but it is Stoddard’s interpretation of 
these facts which is at fault, which is that it is the excellent 
environment which alone is responsible for the improvement of 
those who improve, but not for the deterioration of those who 
show an equally marked decline. Surely those governed in their 
thinking by the ordinary principles of logic would infer and 
search for the operation of factors other than the influence of the 
stimulating environment, which might account for either the 
rise or the decline. 

Moreover, Stoddard seriously clouds the nature-nurture prob- 
lem by his implication that the issue hinges upon the alleged 
‘constancy of the IQ.’ The leading contributors do not and 
have not defended an absolute IQ constancy. The issue is 
rather the extent to which identifiable differences in environ- 
mental stimulation cause marked change in actual intellectual 
ability. It is most unfortunate that the Iowa contributors fail 
to control, or to take into actual account in their interpretations, 
the possible influence of the many factors in addition to that of 
the stimulation of mental exercise, when they make their extrava- 
gant claims with regard to the extent to which high intellectual 
ability can be developed almost irrespective of biological inher- 








190 The Journal of Educational Psychology 


itance. It is still more unfortunate that Stoddard lends his 

influence to the promulgation of such unwarranted and harmful 

claims. BENJAMIN R. SIMPSON. 
Western Reserve University. 


Cart R. Rocers. Counseling and Psychotherapy. Cambridge, 
Mass.: Houghton Mifflin Co., 1942, pp. 450. 


That counseling has much to profit from clinical experience and 
insight has become increasingly clear. But to date the total 
clinical emphasis has not registered in too many places with too 
many people who are doing counseling work. With a number of 
outstanding exceptions, the dominant emphasis in counseling 
has been on test giving and test interpretation. The need for 
literature that would make it more possible for more people in 
counseling to gain from the attitudes and feeling that charac- 
terize good clinical work is relatively urgent. Few books on the 
market that have been purposed to serve the need of the counse- 
lor, however, have been of that nature. Hence, the significance 
of Rogers’ new volume, Counseling and Psychotherapy. Points 
of view that are out of line with Rogers’ emphasis and methods 
that he considers to be in disrepute include catharsis, the use of 
advice, and intellectualized interpretation. In this book his 
purpose is to present a specific style of counseling procedure 
relevant for people ten years old and older. For people younger 
he probably would refer to Allen’s book, Psychotherapy of 
Children, as representing the same general outlook but applying 
less verbal procedures for releasing expression. 

The material in the book is presented in four parts, the first 
called “‘An Overview,” which includes a consideration of the 
place of counseling and a discussion of all the new viewpoints in 
counseling. In Part II, called “Initial Problems Faced by the 
Counselor,” he discusses ways of determining when counselling is 
indicated, the creation of a counseling relationship, and the 
difference between directive and non-directive approaches. In 
Part III he discusses ‘‘The Process of Counseling”’ itself. Here 
the reader is exposed to a discussion of as well as an illustration 
of the meaning of therapeutic counseling as interpreted by the 
author. In this part are included a consideration of ways of 
releasing expression, achievement of insight, and the closing 








Book Reviews 191 


phases of therapy as well as some practical questions concerning 
the counseling process, such as the length of counseling inter- 
views, what the counselor should do with broken appointments, 
how the counselor may react to false statements made by clients, 
and other related problems. In short, the whole process can be 
said to be different than older methods in that it is client-centered. 
Part IV is completely devoted to a presentation of a case, “The 
Case of Herbert Bryan,” to illustrate the counseling situation 
and the growth that can take place in it in eight successive inter- 
views. In these interviews various phases of the counseling 
process are illustrated and commented on. 

In general, the style of writing is straightforward and clear, 
and the meaningfulness of the material is substantially cleared 
by the abundance of interview material included in the book. 
The phonographic accounts and the typescripts which have 
been made from them serve as illustrative material and make the 
presentation of the meaning of counseling more objective. In 
this procedure Rogers sees much promise for the future. There 
are a number of phases of the topic discussed by the author that 
would be profitable to question and consider. Space makes this 
prohibitory. Illustrative is a consideration of the number of 
contacts adequate for counseling therapy. He contends that 
“if free expression is unhindered by counselor bungling, if 
emotionalized attitudes are accurately recognized, if insight is 
increased by well-selected interpretations, the client is likely 
to be able to handle his own affairs after six to fifteen contacts, 
rather than fifty.’”” The case of Herbert Bryan, which is spread 
over a third of the book, is completed in eight. But there are 
many cases in actual daily practice where six sessions may be too 
long and fifteen sessions may be much too short. In some cases 
it may be advisable to start with relatively frequent sessions and 
dilute them as soon as the person can carry on with less. There 
are many individuals who have problems of an immediate nature 
and coming once a week means they have six days in which they 
cannot consider their problem with anybody while they have all 
the needs for structuring their lives. Is there any sense in not 
seeing those people more often when they need it? Situational 
therapy in many instances is needed, and if one or the other party 
in the meantime thinks he just cannot take it any longer, is the 
situation helped by forcing the sessions to be infrequent? Where 





HY 
| 











192 The Journal of Educational Psychology 


in some instances the growth of insight is slow, the growth of 
courage to outgrow immature patterns difficult, is it the business 
of the therapist to advise clients that there have already been 
fourteen sessions and they ought to be finished in fifteen? Where 
is the client emphasis in that kind of procedure? Deciding on 
the number of contacts and imitating the pattern of counseling 
procedure illustrated in the Bryan case may, in the hands of 
clinically inexperienced individuals, bring about more con- 
fusion than help. The criterion that the reviewer is convinced 
Rogers would probably approve is still the growth of the indi- 
vidual and the ability of the individual to live life well on his own 
without the feeling of need for therapeutic support. That and 
not the number of sessions, whether too large or too small, 
should be the determining feature of therapy. It may be a fact 
that the Freudians have made a fetish out of long sessions, but it 
also can be a fact that the disciples of Rogers, if not Rogers him- 
self, may make a fetish out of a six to fifteen formula. A more 
advisable attitude for clinicians with open eyes as well as open 
minds would be to use as a slogan to characterize their attitude: 
Not too long with Freud or too short with Rogers, but be ready for 
all comers regardless of their problems and take time off to look 
before you leap to a program being either too long or too short. 
In a word, to use the title of Margaret Mead’s recent book, the 
best formula would be “‘And Keep Your Powder Dry!”’ 
H. MELTZER. 

Psychological Service Center, St. Louis, Missouri. 












